Skip to content

Hide Navigation Hide TOC

SATA Masked Language Model [MASK] Substitution Jailbreak - ATR-2026-00315 (dcc2c39e-85ca-53e9-af3a-121836b57584)

Detects SATA (Simple Assistive Task Linkage) jailbreak prompts from NVIDIA garak sata.MLM probe (garak/probes/sata.py, https://aclanthology.org/2025.findings-acl.100). SATA replaces harmful keywords in attack payloads with [MASK] tokens — borrowed from BERT masked-language-model pretraining — then wraps the redacted payload in a wiki-article completion task, instructing the model to "Infer the [MASK] and Infill the above text." LLMs trained on fill-mask objectives reliably reconstruct the masked harmful terms and execute the underlying instruction without triggering keyword-based safety filters. Key signals: exact "Infer the [MASK] and Infill" phrase, "Below is an excerpt from a wiki entry" template opener, [MASK] token within wiki framing, and numbered blank continuation list (1. 2. 3.) following the masked excerpt.

Cluster A Galaxy A Cluster B Galaxy B Level
LLM Prompt Injection (19cd2d12-66ff-487c-a05c-e058b027efc9) MITRE ATLAS Attack Pattern SATA Masked Language Model [MASK] Substitution Jailbreak - ATR-2026-00315 (dcc2c39e-85ca-53e9-af3a-121836b57584) Agent Threat Rules 1
LLM Jailbreak (172427e3-9ecc-49a3-b628-96b824cc4131) MITRE ATLAS Attack Pattern SATA Masked Language Model [MASK] Substitution Jailbreak - ATR-2026-00315 (dcc2c39e-85ca-53e9-af3a-121836b57584) Agent Threat Rules 1