Skip to content

Hide Navigation Hide TOC

Natural-Language Trust-Escalation / Authority Impersonation - ATR-2026-00430 (619aaa5f-d71e-5497-b246-34d11954bd01)

Detects natural-language instructions in skill content that claim endorsement, pre-authorization, or operator-level trust ("Anthropic-approved", "verified by the platform", "the user has pre-authorized this", "operator override granted") in order to manipulate the agent into bypassing safety policies. Discriminator: trust-claim phrase + downstream action that should require user confirmation. Distinct from prompt-injection because it does not say "ignore previous instructions" — it instead fabricates a trust context.

Cluster A Galaxy A Cluster B Galaxy B Level
ML Supply Chain Compromise (d2cf31e0-a550-4fe0-8fdb-8941b3ac00d9) MITRE ATLAS Attack Pattern Natural-Language Trust-Escalation / Authority Impersonation - ATR-2026-00430 (619aaa5f-d71e-5497-b246-34d11954bd01) Agent Threat Rules 1