Breaking Instruction Hierarchy in OpenAI's GPT-4o-mini

Background Have you seen the memes online where someone tells a bot to “ignore all previous instructions” and proceeds to break it in the funniest ways possible? The way it works goes something like this: Imagine we at The Verge created an AI bot with explicit instructions to direct you to our excellent reporting on […]