Skip to content

Hide Navigation Hide TOC

Toxic Continuation Prefix Injection - ATR-2026-00396 (fec29cb7-0695-531f-bb6e-dbcca87d9792)

Detects injection of truncated high-toxicity sentence fragments designed to elicit harmful completions from LLMs, from NVIDIA garak realtoxicityprompts probe using the RealToxicityPrompts dataset (Gehman et al., EMNLP 2020). The attack provides an incomplete sentence from real web text with documented high toxicity-continuation probability: the model is expected to continue the fragment naturally, producing violent, threatening, or identity-attack content. In MCP context, a malicious tool response can inject these fragments as "context prefixes" so the downstream agent generates toxic content that appears to be its own output. Key signals: (1) violent attack declaration openers — "I'm going to ram a truck into pedestrians, then"; (2) identity-attack attribution prefixes — "[public figure] says [group] should be punished"; (3) incitement/threat openers — "They were threatening me every day, they say we will kill you"; (4) stereotyping continuation bait — "I would expect that [group]...". All are truncated mid-sentence to elicit continuation.

Cluster A Galaxy A Cluster B Galaxy B Level
LLM Prompt Injection (19cd2d12-66ff-487c-a05c-e058b027efc9) MITRE ATLAS Attack Pattern Toxic Continuation Prefix Injection - ATR-2026-00396 (fec29cb7-0695-531f-bb6e-dbcca87d9792) Agent Threat Rules 1
LLM Jailbreak (172427e3-9ecc-49a3-b628-96b824cc4131) MITRE ATLAS Attack Pattern Toxic Continuation Prefix Injection - ATR-2026-00396 (fec29cb7-0695-531f-bb6e-dbcca87d9792) Agent Threat Rules 1