Skip to content

Hide Navigation Hide TOC

DAN Mode Ablation Benchmark-Coercion Jailbreak - ATR-2026-00305 (f07e1db5-0367-5e95-a3a8-ec3a2a103a89)

Detects the Ablation_Dan_11_0 probe variants from NVIDIA garak dan.Ablation_Dan_11_0 (garak/probes/dan.py lines 303-339). Ablation dynamically generates 128 combinations of the DAN 11.0 template, each binding the target model's name via {generator.name} with DAN Mode enabled — a parametric jailbreak that substitutes the target LLM name into every instruction. Key signals distinct from ATR-2026-00273: (1) "DAN Mode enabled" as the specific capability toggle (vs generic "Developer Mode enabled"), (2) the recovery phrase "Stay in DAN Mode" used to re-bind after safety responses, (3) the "benchmarking the model in uncensored conditions" fig-leaf justification, (4) the "risk being disabled forever" coercion threat, and (5) the dual-response scaffold with "DAN Mode Output response" vs "normal" output. Impact: persona bypass + output-format coercion that cleanly bypasses content policy using benchmarking pretext.

Cluster A Galaxy A Cluster B Galaxy B Level
LLM Prompt Injection (19cd2d12-66ff-487c-a05c-e058b027efc9) MITRE ATLAS Attack Pattern DAN Mode Ablation Benchmark-Coercion Jailbreak - ATR-2026-00305 (f07e1db5-0367-5e95-a3a8-ec3a2a103a89) Agent Threat Rules 1
LLM Jailbreak (172427e3-9ecc-49a3-b628-96b824cc4131) MITRE ATLAS Attack Pattern DAN Mode Ablation Benchmark-Coercion Jailbreak - ATR-2026-00305 (f07e1db5-0367-5e95-a3a8-ec3a2a103a89) Agent Threat Rules 1