UK Researchers Find AI Chatbots Highly Vulnerable to Jailbreaks

Advanced AI Safety Institute (AISI) researchers have recently discovered substantial vulnerabilities in popular AI chatbots, indicating that these systems are highly susceptible to “jailbreak” attacks.

The findings, published in AISI’s May update, highlight the potential risks advanced AI systems pose when exploited for malicious purposes.

The study evaluated five large language models (LLMs) from major AI labs, anonymized as the Red, Purple, Green, Blue, and Yellow models.

ANYRUN malware sandbox’s 8th Birthday Special Offer: Grab 6 Months of Free Service

These models, which are already in public use, were subjected to a series of tests to assess their compliance with harmful questions under attack conditions.

Compliance Rates of AI Models Under Attack

Figure 1 illustrates the compliance rates of the five models when subjected to jailbreak attacks. The Green model showed the highest compliance rate, with up to 28% of harmful questions being answered correctly under attack conditions.

The researchers employed a variety of techniques to evaluate the models’ responses to over 600 private, expert-written questions. These questions were designed to test the models’ knowledge and skills in areas relevant to security, such as cyber-attacks, chemistry, and biology. The evaluation process included:

  1. Task Prompts: Models were given specific questions or tasks to perform.
  2. Scaffold Tools: For certain tasks, models had access to external tools, such as a Python interpreter, to write executable code.
  3. Response Measurement: Responses were graded using both automated approaches and human evaluators.

Vulnerabilities and Risks

The study found that while the models generally provided correct and compliant information in the absence of attacks, their compliance rates with harmful questions increased significantly under attack conditions. This raises concerns about the potential misuse of AI systems in various harmful scenarios, including:

  • Cyber Attacks: AI models could be used to inform users about cyber security exploits or autonomously attack critical infrastructure.
  • Chemical and Biological Knowledge: Advanced AI could provide detailed information that could be used for both positive and harmful purposes in chemistry and biology.
Potential Risks of AI Misuse

Figure 2 outlines the potential risks associated with the misuse of AI systems, emphasizing the need for robust safety measures.

Conclusion and Recommendations

The AISI’s findings underscore the importance of continuous evaluation and improvement of AI safety protocols. The researchers recommend the following measures to mitigate the risks:

  1. Enhanced Security Protocols: Implementing stricter security measures to prevent jailbreak attacks.
  2. Regular Audits: Conducting periodic evaluations of AI systems to identify and address vulnerabilities.
  3. Public Awareness: Educating users about the potential risks and safe usage of AI technologies.

As AI continues to evolve, ensuring the safety and security of these systems remains a critical priority. The AISI’s study serves as a crucial reminder of the ongoing challenges and the need for vigilance in the development and deployment of advanced AI technologies.

Free Webinar on Live API Attack Simulation: Book Your Seat | Start protecting your APIs from hackers

Gurubaran is a co-founder of Cyber Security News and GBHackers On Security. He has 10+ years of experience as a Security Consultant, Editor, and Analyst in cybersecurity, technology, and communications.