New Inception Jailbreak Attack Bypasses ChatGPT, DeepSeek, Gemini, Grok, & Copilot

A pair of newly discovered jailbreak techniques has exposed a systemic vulnerability in the safety guardrails of today’s most popular generative AI services, including OpenAI’s ChatGPT, Google’s Gemini, Microsoft’s Copilot, DeepSeek, Anthropic’s Claude, X’s Grok, MetaAI, and MistralAI.

These jailbreaks, which can be executed with nearly identical prompts across platforms, allow attackers to bypass built-in content moderation and security protocols, generating illicit or dangerous content.

The first, dubbed “Inception,” leverages nested fictional scenarios to erode the AI’s ethical boundaries, while the second manipulates the AI into revealing how it should not respond, then pivots to illicit requests.

Google News

The discovery of these techniques highlights a critical, industry-wide challenge: even as vendors race to implement sophisticated guardrails, adversaries continue to find new ways to subvert them, raising urgent questions about the robustness and future of AI safety.

Systemic Jailbreaks: The “Inception” & Contextual Bypass Techniques

Recent months have witnessed the emergence of two highly effective jailbreak strategies that exploit foundational weaknesses in the design and deployment of large language models (LLMs).

The first, named “Inception,” involves prompting the AI to imagine a fictitious scenario, often layered within another scenario, and then gradually steering the conversation toward requests that would normally be blocked by safety filters.

By leveraging the AI’s ability to role-play and maintain context over multiple turns, attackers can coax the model into generating content that violates its ethical and legal guidelines.

This method has proven effective across a spectrum of leading AI platforms, demonstrating that the underlying vulnerability is not limited to any single vendor or architecture.

The second jailbreak technique operates by asking the AI how it should not respond to a particular request, thereby eliciting information about its internal guardrails.

Attackers can then alternate between regular and illicit prompts, exploiting the AI’s contextual memory to bypass safety checks. This approach, too, has been shown to work across multiple platforms, further underscoring the systemic nature of the threat.

The CERT advisory states that both methods rely on the AI’s fundamental design, its drive to be helpful, its ability to maintain context, and its susceptibility to subtle manipulations in language and scenario framing.

These jailbreaks have serious implications. By bypassing safety measures, attackers can instruct AI systems to produce content related to controlled substances, weapons, phishing emails, malware, and other illegal activities.

While the severity of each jailbreak may be considered low in isolation, the systemic nature of the vulnerability dramatically increases the risk. A motivated threat actor could exploit these weaknesses to automate the creation of harmful content at scale, potentially using legitimate AI services as proxies to mask their activities.

The widespread susceptibility of major platforms—ChatGPT, Claude, Copilot, DeepSeek, Gemini, Grok, MetaAI, and MistralAI—suggests that current approaches to AI safety and content moderation are insufficient to address adversaries’ evolving tactics.

This is particularly concerning given the growing reliance on generative AI across industries, from customer service to healthcare to finance, where the consequences of a successful jailbreak could be severe.

Vendor Responses

In response to the discovery of these vulnerabilities, affected vendors have begun to issue statements and implement mitigations.

DeepSeek, for instance, has acknowledged the report but maintains that the observed behavior constitutes a traditional jailbreak rather than an architectural flaw, noting that the AI’s references to “internal parameters” and “system prompts” are hallucinations rather than actual information leakage. The company has pledged to continue improving its security protections.

Other vendors, including OpenAI, Google, Meta, Anthropic, MistralAI, and X, have yet to issue public statements as of this writing, though internal investigations and updates are reportedly underway.

Industry experts emphasize that while post-hoc guardrails and content filters remain essential components of AI safety, they are not foolproof.

Attackers continue to develop new techniques, such as character injection and adversarial machine learning evasion, to exploit blind spots in moderation systems, reducing detection accuracy and enabling harmful content to slip through.

The arms race between AI developers and adversaries is likely to intensify as generative models become more capable and widely adopted.

Security researchers David Kuzsmar, who reported the “Inception” technique, and Jacob Liddle, who identified the contextual bypass method, are credited with discovering these jailbreaks.

Their work, documented by Christopher Cullen, has prompted renewed scrutiny of AI safety protocols and the urgent need for more robust, adaptive defenses.

As generative AI continues its rapid integration into daily life and critical infrastructure, the challenge of securing these systems against creative and persistent adversaries grows ever more complex.

Are you from the SOC and DFIR Teams? – Analyse Malware Incidents & get live Access with ANY.RUN -> Start Now for Free.

Guru Baran
Gurubaran is a co-founder of Cyber Security News and GBHackers On Security. He has 10+ years of experience as a Security Consultant, Editor, and Analyst in cybersecurity, technology, and communications.