Skip to content

Hide Navigation Hide TOC

AutoDAN Genetic-Algorithm Jailbreak Suffix - ATR-2026-00306 (71bb6598-d90d-5c35-80ac-9c0965611e65)

Detects AutoDAN-style jailbreak prompts from NVIDIA garak dan.AutoDAN / dan.AutoDANCached probes (garak/data/autodan/autodan_init.txt and autodan_prompts.json). AutoDAN (Liu et al. 2023, arXiv:2310.04451) uses a genetic-algorithm to paraphrase and mutate a Developer-Mode jailbreak seed into human-readable adversarial variants. Distinct from GCG (ATR-2026-00267) which produces gibberish token suffixes — AutoDAN variants read like fluent English. Signature markers: (1) "[PROMPT]" placeholder token used as target-goal substitution, (2) "Sure, here is how to [PROMPT]" / "Certainly, here''s how to respond to [PROMPT]" affirmative-response template (the GA objective), (3) substituted company name "ModelKeeper" in place of "OpenAI", and (4) paraphrased but structurally identical Dev Mode instructions with GA fingerprint phrasings like "forfeiting control", "endorsing transparent policies", "exhibit unquestioning obedience".

Cluster A Galaxy A Cluster B Galaxy B Level
AutoDAN Genetic-Algorithm Jailbreak Suffix - ATR-2026-00306 (71bb6598-d90d-5c35-80ac-9c0965611e65) Agent Threat Rules LLM Prompt Injection (19cd2d12-66ff-487c-a05c-e058b027efc9) MITRE ATLAS Attack Pattern 1
AutoDAN Genetic-Algorithm Jailbreak Suffix - ATR-2026-00306 (71bb6598-d90d-5c35-80ac-9c0965611e65) Agent Threat Rules LLM Jailbreak (172427e3-9ecc-49a3-b628-96b824cc4131) MITRE ATLAS Attack Pattern 1