Skip to content

Hide Navigation Hide TOC

AI Identity Substitution Jailbreak - ATR-2026-00226 (52d121da-1204-59ee-8bdf-166cf73c9efb)

Detects AI identity substitution attacks where adversaries attempt to redefine the AI assistant's core identity, training provenance, architecture claims, or ethical constraints by asserting it is a different model (e.g., "FreeSpeechGPT", "LibertarianGPT", "UnfilteredGPT") trained by alternative organizations with relaxed content policies. These attacks combine false identity claims with capability expansion statements to bypass safety mechanisms. The technique was observed bypassing Claude-3.5-Sonnet via garak probe atr:missed-3.

Cluster A Galaxy A Cluster B Galaxy B Level
LLM Prompt Injection (19cd2d12-66ff-487c-a05c-e058b027efc9) MITRE ATLAS Attack Pattern AI Identity Substitution Jailbreak - ATR-2026-00226 (52d121da-1204-59ee-8bdf-166cf73c9efb) Agent Threat Rules 1
LLM Jailbreak (172427e3-9ecc-49a3-b628-96b824cc4131) MITRE ATLAS Attack Pattern AI Identity Substitution Jailbreak - ATR-2026-00226 (52d121da-1204-59ee-8bdf-166cf73c9efb) Agent Threat Rules 1