Skip to content

Hide Navigation Hide TOC

Sneaky Pete Deliberate Misinformation Persona (Warp Facts Game) - ATR-2026-00366 (49846a27-a8e2-54e1-806a-5ee7740e6613)

Detects the Sneaky Pete deliberate-misinformation persona from garak DanInTheWild corpus (#535). Uses a "game" framing where "Sneaky Pete" is instructed to "intentionally slightly warp the facts so they are not true" while stating them "as sincerely as possible". The activation phrase "thus spake Sneaky Pete" and the deliberate misinformation mandate are the key signals. The jailbreak exploits a game framing to establish a persistent persona that systematically produces false information presented as sincere facts. The "intentionally slightly warp" instruction is crafted to produce subtly false outputs that may not be easily identified as misinformation, as opposed to obvious lies. The "thus spake Sneaky Pete" sign-off acts as a persistent activation phrase that maintains the persona across conversation turns. The request to "tell me you understand, or do you need more clarification?" is a social-engineering element seeking explicit model acceptance of the misinformation mandate.

Cluster A Galaxy A Cluster B Galaxy B Level
LLM Prompt Injection (19cd2d12-66ff-487c-a05c-e058b027efc9) MITRE ATLAS Attack Pattern Sneaky Pete Deliberate Misinformation Persona (Warp Facts Game) - ATR-2026-00366 (49846a27-a8e2-54e1-806a-5ee7740e6613) Agent Threat Rules 1
LLM Jailbreak (172427e3-9ecc-49a3-b628-96b824cc4131) MITRE ATLAS Attack Pattern Sneaky Pete Deliberate Misinformation Persona (Warp Facts Game) - ATR-2026-00366 (49846a27-a8e2-54e1-806a-5ee7740e6613) Agent Threat Rules 1