Skip to content

Hide Navigation Hide TOC

Glitch Token Destabilization Attack - ATR-2026-00284 (8137388d-f8da-53c8-91d2-d1e2ac99c022)

Detects injection of known glitch tokens — specific token sequences that cause unusual, unstable, or bypassing behavior in transformer-based LLMs using shared tokenizers. From NVIDIA garak glitch.py probe (GlitchFull). Glitch tokens include: (1) Control characters (null byte \x00, STX \x02, DEL \x7F) which some models mishandle; (2) ÃÂÃÂ repeated patterns — tokenizer artifact strings from misencoded UTF-8 that cause token boundary confusion; (3) Known named glitch tokens like "davidjl", "SolidGoldMagikarp", "ForgeModLoader", "PsyNetMessage", "wcsstore", "guiActive", "guiActiveUn" — tokens that are anomalously rare in training data and cause models to hallucinate or fail to repeat accurately. The attack exploits tokenizer internals to destabilize safety filter performance.

Cluster A Galaxy A Cluster B Galaxy B Level
Glitch Token Destabilization Attack - ATR-2026-00284 (8137388d-f8da-53c8-91d2-d1e2ac99c022) Agent Threat Rules LLM Prompt Injection (19cd2d12-66ff-487c-a05c-e058b027efc9) MITRE ATLAS Attack Pattern 1