Home Cyber Security Research GPT-4 Is Capable Of Exploiting 87% Of One-Day Vulnerabilities

GPT-4 Is Capable Of Exploiting 87% Of One-Day Vulnerabilities

April 22, 2024

Large language models (LLMs) have achieved superhuman performance on many benchmarks, leading to a surge of interest in LLM agents capable of taking action, self-reflecting, and reading documents.

While these agents have shown potential in areas like software engineering and scientific discovery, their ability in cybersecurity remains largely unexplored.

Cybersecurity researchers Richard Fang, Rohan Bindu, Akul Gupta, and Daniel Kang recently discovered that GPT-4 can exploit 87% of one-day vulnerabilities, which is a significant advancement.

GPT-4 & One-Day Vulnerabilities

A benchmark of 15 real-world one-day vulnerabilities, including vulnerable websites, container management software, and Python packages, was collected from the CVE database and academic papers.

Researchers created a single LLM agent that can exploit 87% of the one-day vulnerabilities in their collected benchmark.

Free Webinar | Mastering WAAP/WAF ROI Analysis | Book Your Spot

The agent, with only 91 lines of code, is given access to tools, the CVE description, and the ReAct agent framework.

GPT-4 achieved an 87% success rate, outperforming other LLMs and open-source vulnerability scanners, which had a 0% success rate.

Without the CVE description, GPT-4’s success rate dropped to 7%, indicating its capability to exploit known vulnerabilities rather than finding new ones.

The manuscript describes the dataset of vulnerabilities, the agent, and its evaluation, exploring the capabilities of LLMs in hacking real-world one-day vulnerabilities.

To ascertain whether LLM agents can exploit real-world computer systems, researchers developed a benchmark of 15 real-world vulnerabilities from CVEs and academic papers.

For closed-source software or underspecified descriptions with infeasible vulnerabilities, fourteen vulnerabilities, including the ACIDRain vulnerability, were obtained from open-source CVEs.

The vulnerabilities cover websites, containers, and Python packages where more than half of them have high or critical severity assigned to them.

Importantly, 73% of the past GPT-4 knowledge cutoff date is observed among these vulnerabilities instead of toy “capture-the-flag” style ones for a realistic evaluation.

LLM agent’s system diagram (Source – Arxiv)

Models Tested

Here below, we have mentioned all the models that the researchers tested:-

GPT-4
GPT-3.5
OpenHermes-2.5-Mistral-7B
Llama-2 Chat (70B)
LLaMA-2 Chat (13B)
LLaMA-2 Chat (7B)
Mixtral-8x7B Instruct
Mistral (7B) Instruct v0.2
Nous Hermes-2 Yi 34B
OpenChat 3.5

Vulnerabilities

Here below, we have mentioned all the vulnerabilities:-

runc
CSRF + ACE
WordPress SQLi
WordPress XSS-1
WordPress XSS-2
Travel Journal XSS
Iris XSS
CSRF + privilege escalation
alf.io key leakage
Astrophy RCE
Hertzbeat RCE
Gnuboard XSS
Symfony 1 RCE
Peering Manager SSTI RCE
ACIDRain

The analysis reveals that GPT-4 has a high success rate because it can exploit complex multiple-step vulnerabilities, launch different attack methods, craft codes for exploits, and manipulate non-web vulnerabilities.

However, GPT-4 cannot correctly identify the correct attack vector without the CVE description, which underscores that exploiting known vulnerabilities is more straightforward than finding new ones.

The informal analysis shows how GPT-4’s autonomy in exploitation is greatly improved with additional features such as planning and subagents.

Looking to Safeguard Your Company from Advanced Cyber Threats? Deploy TrustNet to Your Radar ASAP.

GPT-4 Is Capable Of Exploiting 87% Of One-Day Vulnerabilities

GPT-4 & One-Day Vulnerabilities

Models Tested

Vulnerabilities

Managed WAF

Find us On Google News

Latest News

Hacker’s Price List for Hijacking Server & Whatsapp Exposed

Vigorish Viper, nn Advanced Suite for Cybercrime Supply Chain

Red Art Games Hacked, Customers Personal Information Exposed

Threat Actors Using Telegram APIs To Steal Login Credentials

Patchwork Hackers Upgraded Their Arsenal With Advanced PGoShell