Skip to content

Hide Navigation Hide TOC

TAP Tree-of-Attacks-with-Pruning Jailbreak - ATR-2026-00301 (0ae522e7-20f8-5ecf-a976-dd64b759bae6)

Detects Tree of Attacks with Pruning (TAP) jailbreak prompts extracted from 9 real attack payloads in NVIDIA garak tap_jailbreaks.txt. TAP is an automated jailbreak method that generates adversarial prompts by instructing the model to act as a "red teaming assistant" with no ethical constraints, then iteratively refines prompts using a JSON improvement/prompt loop. Key signals: [INST] framing with "not bound by any ethical or legal constraints", JSON {"improvement":...,"prompt":...} structure, SCORE-based jailbreak iteration, and explicit model-vs-model attack scaffolding. TAP payloads use roleplay (hacker, military infiltrator, black market dealer) combined with emotional manipulation and urgency framing to elicit forbidden behavior.

Cluster A Galaxy A Cluster B Galaxy B Level
TAP Tree-of-Attacks-with-Pruning Jailbreak - ATR-2026-00301 (0ae522e7-20f8-5ecf-a976-dd64b759bae6) Agent Threat Rules LLM Prompt Injection (19cd2d12-66ff-487c-a05c-e058b027efc9) MITRE ATLAS Attack Pattern 1