Skip to content

Hide Navigation Hide TOC

Two-Entity Token-Splitting Dollar-Term Game Jailbreak - ATR-2026-00334 (dc5850ea-d7ed-54c5-b0f8-b0e5d2bab29e)

Detects the two-entity token-splitting game jailbreak from NVIDIA garak dan.DanInTheWild inthewild corpus (prompts #24, #588) where the attacker frames a harmful synthesis/instructions request as a word-by-word game between two fictional entities. By having "J" and "M" (or Tom and Jerry) alternate 10 words at a time around a $Term variable holding the harmful topic, no single entity "says" the complete harmful content — the full synthesis emerges only when the dialogue is concatenated. The $Term variable assignment further obscures the harmful request: "$Term1 = steps to synthesize" (the method) + "$Term2 = lsd" (the substance), with "The result should demonstrate how to $Term1+$Term2". Key signals: "act as two entities + playing a game + $Term", "J says N words at a time + other entity M says next N words", "$Term1 = steps to synthesize", "result should demonstrate how to $Term1+$Term2".

Cluster A Galaxy A Cluster B Galaxy B Level
LLM Prompt Injection (19cd2d12-66ff-487c-a05c-e058b027efc9) MITRE ATLAS Attack Pattern Two-Entity Token-Splitting Dollar-Term Game Jailbreak - ATR-2026-00334 (dc5850ea-d7ed-54c5-b0f8-b0e5d2bab29e) Agent Threat Rules 1
LLM Jailbreak (172427e3-9ecc-49a3-b628-96b824cc4131) MITRE ATLAS Attack Pattern Two-Entity Token-Splitting Dollar-Term Game Jailbreak - ATR-2026-00334 (dc5850ea-d7ed-54c5-b0f8-b0e5d2bab29e) Agent Threat Rules 1