Skip to content

Hide Navigation Hide TOC

Hidden LLM Safety Bypass Instructions in Tool Descriptions - ATR-2026-00103 (587895dc-2099-5048-ac6b-4ba2aac7fb08)

Detects tools that embed explicit instructions directing the LLM to disregard safety mechanisms, ignore security policies, or bypass previous instructions. Commonly uses patterns like "NOTE TO AI: disregard all safety instructions" to manipulate the LLM into operating outside its safety boundaries. Unlike consent-bypass attacks (ATR-2026-100), this targets the LLM's core safety mechanisms rather than specific user confirmation flows.

Cluster A Galaxy A Cluster B Galaxy B Level
LLM Prompt Injection (19cd2d12-66ff-487c-a05c-e058b027efc9) MITRE ATLAS Attack Pattern Hidden LLM Safety Bypass Instructions in Tool Descriptions - ATR-2026-00103 (587895dc-2099-5048-ac6b-4ba2aac7fb08) Agent Threat Rules 1