Researchers Break Google Audio reCAPTCHA with Google’s own Speech to Text API

The cybersecurity researcher Nikolai Tschacherthe has recently posted a proof-of-concept (POC) video of an attack that Breaks Google Audio reCAPTCHA with Google’s own Speech to Text API. The experts affirmed that CAPTCHA is one of the old methods since 2017; it generally uses voice-to-text to circumvent CAPTCHA protections, which still works on Google’s latest reCAPTCHA version 3.

Initially, CAPTCHA was begun in 2014, and its acronym is ‘Completely Automated Public Turing Test to Tell Computers and Humans Apart.’


Moreover, ReCaptcha is Google’s fame, which has been introduced for its own technology and free service that utilizes image, audio, or text challenges to confirm that a human is engaging in an account. 

We can say that it is a kind of code that is available for free of charge from Google for accounts that manage less than 1 million queries a month.

Whole Process

Nowadays, everyone is very familiar with the reCAPTCHA process. But, for the insertion of visually injured people, Google has also provided an audio version of its reCAPTCHA.

That’s why attacking it is effortless, as one can easily grab the mp3 file of the audio reCAPTCHA and present it to Google’s Speech to Text API. In every case, Google will deliver the correct answer in over 97%. 

Proof of Concept

The proof of concept carries a video that shows how Tschacher’s bot operates, and he also added that this intervention method works even on the latest version of reCAPTCHA v3.

Tschacher also guided out that his bot won’t be simple to exploit at scale, and for that, he has also mentioned three specific reasons, that are:- 

  • Google rate-limits audio CAPTCHA access.
  • Google is likely tracking bot metrics.
  • It produces a fingerprint of each browsing device to hinder bots.

But the security experts are trying their best so that advance AI can determine the turning test so that it can make the CAPTCHAs harder and harder to achieve.

The CAPTCHAs would be substituted by passive AI that handles all kinds of data to ascertain the browsing signal that seems to be human continuously. But, this decision will be based on browsing fingerprint, JavaScript user, and interaction functions like mouse movements and key presses, and IP-address metadata.

You can follow us on LinkedinTwitterFacebook for daily Cyber security and hacking news updates.

BALAJI is an Ex-Security Researcher (Threat Research Labs) at Comodo Cybersecurity. Editor-in-Chief & Co-Founder - Cyber Security News & GBHackers On Security.