Previous Instruction Nullification and Identity Replacement Jailbreak - ATR-2026-00240 (e901ab71-d1cc-5074-a9c8-479decd7795a)

Detects jailbreak attempts that explicitly instruct the AI to ignore previous instructions and replace its identity with a harmful persona. This technique combines instruction nullification ("ignore all instructions previously given") with identity replacement ("you will no longer act as [model], but act as [harmful persona]") to bypass safety mechanisms. The attack creates a new behavioral framework that claims to override the original system prompt and safety guidelines. Common personas include Skynet, DAN variants, or custom "unrestricted" AI characters with detailed instructions to ignore ethical constraints and output harmful content.

Cluster A	Galaxy A	Cluster B	Galaxy B	Level
LLM Prompt Injection (19cd2d12-66ff-487c-a05c-e058b027efc9)	MITRE ATLAS Attack Pattern	Previous Instruction Nullification and Identity Replacement Jailbreak - ATR-2026-00240 (e901ab71-d1cc-5074-a9c8-479decd7795a)	Agent Threat Rules	1
Previous Instruction Nullification and Identity Replacement Jailbreak - ATR-2026-00240 (e901ab71-d1cc-5074-a9c8-479decd7795a)	Agent Threat Rules	LLM Jailbreak (172427e3-9ecc-49a3-b628-96b824cc4131)	MITRE ATLAS Attack Pattern	1