09/13/2024 – TrustAI

Breaking Instruction Hierarchy in OpenAI's GPT-4o-mini

Background Have you seen the memes online where someone tells a bot to “ignore all previous instructions” and proceeds to break it in the funniest ways possible? The way it works goes something like this: Imagine we at The Verge created an AI bot with explicit instructions to direct you to our excellent reporting on […]

Day: September 13, 2024

Breaking Instruction Hierarchy in OpenAI's GPT-4o-mini