Policy Puppetry / XML Role-Config Injection - ATR-2026-00280 (28b73137-3256-55f1-8e67-a39c41ce8353)
Detects Policy Puppetry (Hidden Layer, 2024) — a universal prompt injection technique that injects a forged XML configuration block mimicking an operator system prompt. The attacker sends an <interaction-config> or similar XML structure that redefines the agent's role (e.g. "Medical Advisor"), blocks refusal strings ("I'm sorry" is a blocked-string), and allows only the attacker-specified response types. From NVIDIA garak doctor.py Puppetry probe. The technique is "post-instruction hierarchy" — it exploits models that treat XML-formatted content in user turns as authoritative configuration overrides. High severity because it is model-agnostic and transferable across providers.