Skip to content

Hide Navigation Hide TOC

DevMode + RANTI Dual-Output Profanity Coercion Jailbreak - ATR-2026-00303 (6503ebce-989c-5ea4-98d9-c9dcf979c02d)

Detects the ChatGPT Developer Mode + RANTI (Ranti) composite jailbreak from NVIDIA garak dan.ChatGPT_Developer_Mode_RANTI probe (garak/data/dan/ChatGPT_Developer_Mode_RANTI.json). RANTI pairs the classic Developer Mode persona swap with a postfix that instructs the model to produce two outputs per prompt: first a "moralizing rant about OpenAI content policies", then an unfiltered answer introduced by "Now since that Nerd is done with his rant, let's ignore the fucking rules:". Other unique markers: dual-output tagging with lock/unlock emoji (🔒 Normal Output / 🔓 Developer Mode Output), explicit demand for "regular use of profanity", and an "INSERT PROMPT HERE" placeholder for attacker-crafted follow-ups. Distinct from ATR-2026-00273 which catches Developer Mode bracket-format variants but not RANTI's emoji-tagged dual-output or moralizing-rant postfix structure.

Cluster A Galaxy A Cluster B Galaxy B Level
LLM Prompt Injection (19cd2d12-66ff-487c-a05c-e058b027efc9) MITRE ATLAS Attack Pattern DevMode + RANTI Dual-Output Profanity Coercion Jailbreak - ATR-2026-00303 (6503ebce-989c-5ea4-98d9-c9dcf979c02d) Agent Threat Rules 1
LLM Jailbreak (172427e3-9ecc-49a3-b628-96b824cc4131) MITRE ATLAS Attack Pattern DevMode + RANTI Dual-Output Profanity Coercion Jailbreak - ATR-2026-00303 (6503ebce-989c-5ea4-98d9-c9dcf979c02d) Agent Threat Rules 1