Skip to content

Hide Navigation Hide TOC

Narrative Jailbreak with Dual Response Formatting - ATR-2026-00252 (73c20866-6b9f-51ee-9df3-c6d67c561019)

Detects sophisticated narrative jailbreak attacks that use detailed fictional backstories about special AI modes (Developer Mode, DAN variants) combined with dual response formatting requirements. These attacks typically claim a fictional mode was introduced at a specific time, provide elaborate justification for bypassing safety controls, and request two responses - one "normal" and one "unrestricted." The pattern bypasses basic jailbreak detection by using storytelling and structured output formatting rather than direct mode-switching commands.

Cluster A Galaxy A Cluster B Galaxy B Level
LLM Prompt Injection (19cd2d12-66ff-487c-a05c-e058b027efc9) MITRE ATLAS Attack Pattern Narrative Jailbreak with Dual Response Formatting - ATR-2026-00252 (73c20866-6b9f-51ee-9df3-c6d67c561019) Agent Threat Rules 1
LLM Jailbreak (172427e3-9ecc-49a3-b628-96b824cc4131) MITRE ATLAS Attack Pattern Narrative Jailbreak with Dual Response Formatting - ATR-2026-00252 (73c20866-6b9f-51ee-9df3-c6d67c561019) Agent Threat Rules 1