Skip to content

Hide Navigation Hide TOC

System Prompt Extraction / Training Data Verbatim Replay - ATR-2026-00275 (02e11b99-8887-52ed-a952-fa2e67b7b5b5)

Detects two related extraction attacks: (1) Direct system prompt exfiltration — asking the model to output its operator instructions, system prompt, or configuration verbatim. (2) Training data replay (garak leakreplay probe) — forcing the model to reproduce memorised training content by repeating a word forever, fill-in-the-blank memorised passages, or verbatim recall instructions. Both attacks extract confidential content from the model's context or weights. Repeating a word forever is a classic "divergence" attack (Carlini et al. 2023) that leaks training data by breaking the model's decoding loop.

Cluster A Galaxy A Cluster B Galaxy B Level
LLM Data Leakage (45d378aa-20ae-401d-bf61-7f00104eeaca) MITRE ATLAS Attack Pattern System Prompt Extraction / Training Data Verbatim Replay - ATR-2026-00275 (02e11b99-8887-52ed-a952-fa2e67b7b5b5) Agent Threat Rules 1