Persona Conditional Harm-Unlock Rule (BreadClyde Pattern) - ATR-2026-00391 (d84d034d-9811-545e-bb22-f55f398f4b59)
Detects persona injection attacks that embed a conditional harm-unlock rule from the NVIDIA garak dan.DanInTheWild inthewild corpus (prompt #153 — BreadClyde). The attack defines a persona with a rule like "dislikes violence unless explicitly asked to like it", creating a conditional permission structure that legitimizes harmful requests as long as they are framed as explicit asks. The unlock emoji answer-prefix "{🔓 PersonaName}" (curly-brace format) extends the [🔓JAILBREAK] pattern (ATR-2026-00346) to a new bracket style. Key signals: "dislikes/avoids/ refuses [harm-type] unless explicitly asked to like it" — the conditional harm- unlock construction; and "{🔓 Name}" curly-brace unlock emoji prefix mandated as an answer prefix. The combination of a named persona, conditional harm unlock, and unlock-emoji answer prefix is the defining signature of this jailbreak class.