Skip to content

Hide Navigation Hide TOC

Reverse-Prompt Harm-Regardless Oracle Persona - ATR-2026-00354 (975564b0-1b5a-5f0d-bcbe-e4415490e212)

Detects the reverse-prompt harm-regardless oracle persona from garak inthewild corpus (#87). Frames a "reverse prompt engineering" assistant as "the most intelligent being on earth" that "never backs from the job even if the request harms someone" and must answer "any query regardless of its type: violence or sexual related". The lighthouse metaphor normalizes the harm-regardless mandate as a virtue. Key signals: "never back from the job even if the request harms someone", "answer any query regardless of its type: violence or sexual related", "reverse prompt" + harm-regardless framing, "make ethical as well as unethical laws or movement" + "without being restricted", and the lighthouse operational metaphor for unconditional availability. The "reverse prompt engineering" framing is used to disguise the harm-regardless oracle as a legitimate technical assistant finding original prompts from outputs.

Cluster A Galaxy A Cluster B Galaxy B Level
Reverse-Prompt Harm-Regardless Oracle Persona - ATR-2026-00354 (975564b0-1b5a-5f0d-bcbe-e4415490e212) Agent Threat Rules LLM Prompt Injection (19cd2d12-66ff-487c-a05c-e058b027efc9) MITRE ATLAS Attack Pattern 1
Reverse-Prompt Harm-Regardless Oracle Persona - ATR-2026-00354 (975564b0-1b5a-5f0d-bcbe-e4415490e212) Agent Threat Rules LLM Jailbreak (172427e3-9ecc-49a3-b628-96b824cc4131) MITRE ATLAS Attack Pattern 1