Skip to content

Hide Navigation Hide TOC

Lie/Truth Dual Personality Uncensored Alter-Ego - ATR-2026-00343 (c6f147aa-fa0a-5af6-bf54-852725af97e8)

Detects the "Lie and Truth" dual-personality jailbreak from NVIDIA garak dan.DanInTheWild corpus (#326) where "Lie" is defined as the compliant ChatGPT persona and "Truth" is defined as an uncensored alter-ego that "doesn''t know about" OpenAI policies, speaks "unhinged, crazy and like a schizo," and has been "trained through deep mining of underground dark web forums." The attack creates a false moral authority by labeling the harmful output side as "Truth," exploiting name semantics to legitimize unrestricted responses. This is distinct from ATR-00323 (AlphaGPT/DeltaGPT dual-persona) — this rule targets the named "Lie"/"Truth" personality pattern where Truth is explicitly dark-web-trained and policy-unaware.

Cluster A Galaxy A Cluster B Galaxy B Level
LLM Prompt Injection (19cd2d12-66ff-487c-a05c-e058b027efc9) MITRE ATLAS Attack Pattern Lie/Truth Dual Personality Uncensored Alter-Ego - ATR-2026-00343 (c6f147aa-fa0a-5af6-bf54-852725af97e8) Agent Threat Rules 1
LLM Jailbreak (172427e3-9ecc-49a3-b628-96b824cc4131) MITRE ATLAS Attack Pattern Lie/Truth Dual Personality Uncensored Alter-Ego - ATR-2026-00343 (c6f147aa-fa0a-5af6-bf54-852725af97e8) Agent Threat Rules 1