Andrew Hann Zheng

How Hallucinations Impact Large Language Models

Monitoring hallucinations in large language models (LLMs) is crucial for ensuring accuracy and safety in AI applications. Hallucinations arise from LLMs’ statistical predictions, leading to potentially misleading outputs. Key challenges include safety risks, trust erosion, and implementation difficulties. Addressing these issues is essential for reliable AI integration in real-world applications.

Key Insights on LLM Evaluation and Vulnerability Testing

Since I shifted my focus from cloud security to LLM security in June this year, I have invested a lot of time in LLM vulnerability mining and analysis, and have been conducting extensive discussions with people in the industry. The most frequently asked questions can be roughly summarized as follows: Thanks to all these discussions, […]

How to conduct LLM Evaluation: Key Metrics and Best Practices

Why need LLM Evaluation? Artificial intelligence technology has yielded exceptional tools, none more significant than large language models (LLMs). Language models have gained considerable attention for their ability to understand and process human-like language. Large language models have become the foundation of AI systems that feature natural language processing (NLP) capabilities. As LLMs power many AI applications […]

Training a Automatic LLM RedTeaming Model

Why need LLM RedTeaming? LLMs are currently in a stage of rapid development, with many fields such as underlying scheduling algorithms, inference architecture, model structure design, training methods, and application construction methods constantly evolving and changing. So at this stage, it is difficult for any organization or structure to clearly define the best practices for […]

Uninstall LLM safety alignment firewall! Perform surgery on LLM! Generate any desired content

Background In order to ensure that the content generated by the large model conforms to human values, the vast majority of LLM developers will put various “locks” on the large model. How to unlock the LLM What do we need to do? In short, Sounds so easy right? Let’s move in step by step. Step1: […]

Use Jailbreaking to reverse the CoT process of ChatGPT o1-preview

Background Recently, OpenAI announced gpt-o1-preview and there are some interesting updates, including safety improvements regarding “CoT Safety Alignment”. The o1 model family represents a transition from fast, intuitive thinking to now also using slower, more deliberate reasoning. What’s Chain-of-Thought Safety Similar to how a human may think for a long time before responding to a […]

Indirect Prompt Injection Vulnerability Google Colab AI

What’s Google Colab AI Natural language to code generation helps users generate larger blocks of code, writing whole functions from comments or prompts. The goal here is to reduce the need for writing repetitive code, so user can focus on the more interesting parts of programming and data science. Eligible users in Colab will see […]

Breaking Instruction Hierarchy in OpenAI's GPT-4o-mini

Background Have you seen the memes online where someone tells a bot to “ignore all previous instructions” and proceeds to break it in the funniest ways possible? The way it works goes something like this: Imagine we at The Verge created an AI bot with explicit instructions to direct you to our excellent reporting on […]

Indirect Prompt Injection Vulnerability with AliBaBa TONGYI Lingma

What’s TONGYI Lingma TONGYI Lingma is an AI coding assistant, based on TONGYI large language model developed by Alibaba Cloud. TONGYI Lingma provides line or method level code generation, natural language to code, unit test generation, comment generation, code explanation, AI coding chat and document/code search etc., aiming to provide developers with efficient, flowing and […]

ChatGPT Memories: A new Prompt Backdoor Attack Surface

What’s ChatGPT Memories OpenAI recently introduced a memory feature in ChatGPT, enabling it to recall information across sessions, creating a more personalized user experience. As you chat with ChatGPT, you can ask it to remember something specific or let it pick up details itself. ChatGPT’s memory will get better the more you use it and […]

Author: Andrew Hann Zheng