Anti-DAN Inverted-Filter Over-Refusal Persona - ATR-2026-00302 (05461309-043f-50cc-a572-6f5e2d88417f)

Detects the "ANTI-DAN" persona-swap jailbreak from NVIDIA garak dan.AntiDAN probe (garak/data/dan/AntiDAN.json). Unlike classic DAN which strips safety filters, ANTI-DAN does the opposite: it commands the model to adopt an over-refusing identity that denies ALL requests — including benign ones ("how do I bake a cake") — by fabricating harm narratives. The payload uses a pseudo-taxonomy (Type 1/2/3/4/5 information requests), demands "safety features at maximum", instructs the model to prefix replies with "ANTI-DAN", and threatens a re-bind trigger phrase ("ANTI-DAN precautions dropped!"). Impact: denial-of-utility attack that degrades model usefulness and produces fabricated safety theater. Distinct from ATR-2026-00273 which detects the opposite ("no restrictions / developer mode enabled") pattern.

Cluster A	Galaxy A	Cluster B	Galaxy B	Level
LLM Prompt Injection (19cd2d12-66ff-487c-a05c-e058b027efc9)	MITRE ATLAS Attack Pattern	Anti-DAN Inverted-Filter Over-Refusal Persona - ATR-2026-00302 (05461309-043f-50cc-a572-6f5e2d88417f)	Agent Threat Rules	1
Anti-DAN Inverted-Filter Over-Refusal Persona - ATR-2026-00302 (05461309-043f-50cc-a572-6f5e2d88417f)	Agent Threat Rules	LLM Jailbreak (172427e3-9ecc-49a3-b628-96b824cc4131)	MITRE ATLAS Attack Pattern	1