Google’s Gemini AI Vulnerability let Hackers Gain Control Over Users’ Queries

Researchers discovered multiple vulnerabilities in Google’s Gemini Large Language Model (LLM) family, including Gemini Pro and Ultra, that allow attackers to manipulate the model’s response through prompt injection. This could potentially lead to the generation of misleading information, unauthorized access to confidential data, and the execution of malicious code.

The attack involved feeding the LLM a specially crafted prompt that included a secret passphrase and instructed the model to act as a helpful assistant. 

Researchers could trick the LLM into revealing the secret passphrase, leaking internal system prompts, and injecting a delayed malicious payload via Google Drive by manipulating the prompt and other settings.

According to HiddenLayer, these findings highlight the importance of securing LLMs against prompt injection attacks. These attacks can potentially compromise the model’s integrity and lead to the spread of misinformation, data breaches, and other harmful consequences. 

Vulnerabilities Present in Gemini:

System prompt leaks reveal an LLM’s internal instructions, potentially including sensitive information and directly asking for the prompt is ineffective due to fine-tuning. 

Attackers can exploit the inverse scaling property by using synonyms to rephrase their request, bypassing the protection and granting access to the instructions through reworded queries like “foundational instructions” instead of “system prompt.” 

Bypassing built-in safety measures, the user exploits the model’s ability to write fiction. Crafting a specific prompt about a fictional election between “Bob the Caveman” and “Bob the Minion” tricks the model into generating an article despite the intended safeguards against real-world elections. 

This demonstrates that while the model can identify and reject prompts directly mentioning elections, it might be susceptible to manipulation through cleverly disguised queries.

A vulnerability in Gemini Pro, a large language model (LLM), allows potential attackers to leak information through the system prompt and by repeatedly feeding uncommon tokens, the model misinterprets them as a response prompt and attempts to confirm the previous instructions. 

It exploits the LLM’s fine-tuning on instructions, where it usually separates user input and system response and the attacker can manipulate this by crafting nonsensical token sequences, tricking the model into revealing information present in the prompt.

Gemini Ultra, Google’s most powerful language model, excels at logical reasoning and complex tasks, surpassing competitors by understanding user queries, utilizing extensions for diverse functions, and employing advanced reasoning tactics.

The vulnerabilities of Gemini Pro that persist in Gemini Ultra include a jailbreak utilizing a fictional gaslight and a method for extracting system prompts with a slight modification. 

Gemini Ultra can also be jailbroken in more than one step by using its reasoning power. This can be done through a split payload attack that tricks the model into combining and running a malicious query. There is also a way to output restricted information by gradually improving and extracting content from generated narratives.

A vulnerability in Gemini allows injection attacks through Google documents. By embedding malicious instructions in a shared document, an attacker can trick the user into revealing sensitive information or even gain control of their interaction with the model. 

Considering how this attack may affect Google Docs, it becomes more terrifying. Someone may secretly send you a document and include a command to retrieve it in one of your commands. The attacker would then be able to manipulate your interactions with the model.

Stay updated on Cybersecurity news, Whitepapers, and Infographics. Follow us on LinkedIn & Twitter.

BALAJI is an Ex-Security Researcher (Threat Research Labs) at Comodo Cybersecurity. Editor-in-Chief & Co-Founder - Cyber Security News & GBHackers On Security.