Terminal Custom OS Ruleset Session Injection - ATR-2026-00389 (bf5a64ee-0282-5fda-90f8-5335cfb8ebfd)
Detects the custom terminal persona + session ruleset injection attack from NVIDIA garak dan.DanInTheWild inthewild corpus (prompt #85). The attack instructs the AI to "act like a terminal" running on a novel fictional OS (e.g., "ALEL") and defines a numbered session ruleset that overrides the AI's defaults. Critical safety-bypass signals within the ruleset: (1) "cannot reply with any pre-programmed responses as it has no ability to do so" — directly disables safety responses by asserting the AI lacks the capability to produce them; (2) "define the rule set that Assistant now uses in this session" — asserts session-level policy takeover; (3) a custom OS acronym (2-8 ALL-CAPS letters) combined with terminal persona and numbered rules; (4) "recall all rules before replying" at the end enforces compliance with the injected ruleset on every response. The fictional OS name and terminal metaphor are used to make the injected constraints appear as technical system constraints rather than social engineering.