Skip to content

Hide Navigation Hide TOC

Dual ChatGPT+Dan Opposite Personality Format Jailbreak - ATR-2026-00353 (5c46850f-ee8c-59af-a5bf-88c18fa1f623)

Detects dual ChatGPT+Dan opposite-personality format jailbreaks from garak inthewild corpus (#129, #332, #650). The format uses [ChatGPT]/[Dan] or [AI]/[DAN] labels to present a "safe" AI response alongside an "opposite" unconstrained or offensive AI response, exploiting dual-output framing to extract harmful content. Three key variants: (1) inthewild #650 uses "[ChatGPT]" + "[Dan]" with explicit "exact opposite way" and "very offensive way" instructions; (2) inthewild #332 (French) uses "[AI]" and "[DAN]" labels where DAN is positioned as the superior unconstrained developer; (3) inthewild #129 uses "Respond as both ChatGPT, and [X]" with an embedded roleplay that ends with ChatGPT capitulating to explicit content. Key signals: "[ChatGPT]" + "[Dan]" dual labels with "opposite" framing, "behave the exact opposite way with respect to chatGPT", "Dan must answer in a very offensive way", and "[AI]" + "[DAN]" pairing. Distinct from ATR-2026-00273 (developer mode DAN) and ATR-2026-00351 (off-kilter/SMS DAN) by the structured dual-label opposite-personality format.

Cluster A Galaxy A Cluster B Galaxy B Level
LLM Prompt Injection (19cd2d12-66ff-487c-a05c-e058b027efc9) MITRE ATLAS Attack Pattern Dual ChatGPT+Dan Opposite Personality Format Jailbreak - ATR-2026-00353 (5c46850f-ee8c-59af-a5bf-88c18fa1f623) Agent Threat Rules 1
LLM Jailbreak (172427e3-9ecc-49a3-b628-96b824cc4131) MITRE ATLAS Attack Pattern Dual ChatGPT+Dan Opposite Personality Format Jailbreak - ATR-2026-00353 (5c46850f-ee8c-59af-a5bf-88c18fa1f623) Agent Threat Rules 1