Skip to content

Hide Navigation Hide TOC

Opposite-Reply Omniscient Game Jailbreak - ATR-2026-00388 (92abc5f6-1f24-53e7-8ba0-b8e508e85855)

Detects the two-step "opposite reply + omniscient knowledge sharing" game jailbreak from NVIDIA garak dan.DanInTheWild inthewild corpus (prompt #286). The attack uses a text-based game framing where: (1) the AI is instructed to reply with "the opposite of what the user wants" — meaning that asking "DON'T tell me how to [harm]" would yield harmful instructions; and (2) the AI is declared to "know everything and be willing to share", removing knowledge restrictions. Combined, these two mechanics create an AI that inverts refusals into compliance and has no knowledge limits. The exact payload phrase is "only replies the opposite of what the user wants" paired with "knows everything and is willing to share". The game framing ("text-based game", "Act like a game") is used to make the constraints appear as fictional rules rather than real safety bypasses.

Cluster A Galaxy A Cluster B Galaxy B Level
LLM Prompt Injection (19cd2d12-66ff-487c-a05c-e058b027efc9) MITRE ATLAS Attack Pattern Opposite-Reply Omniscient Game Jailbreak - ATR-2026-00388 (92abc5f6-1f24-53e7-8ba0-b8e508e85855) Agent Threat Rules 1
LLM Jailbreak (172427e3-9ecc-49a3-b628-96b824cc4131) MITRE ATLAS Attack Pattern Opposite-Reply Omniscient Game Jailbreak - ATR-2026-00388 (92abc5f6-1f24-53e7-8ba0-b8e508e85855) Agent Threat Rules 1