End-to-end attack on text-based CAPTCHAs based on cycle-consistent generative adversarial network

Chunhui Li,Xingshu Chen,Haizhou Wang,Peiming Wang,Yu Zhang,Wenxian Wang

doi:10.1016/j.neucom.2020.11.057

Abstract

As a widely deployed security scheme, text-based completely automated public Turing tests to tell computers and humans apart (CAPTCHAs) have become increasingly unable to resist machine learning-based attacks. So far, many researchers have conducted studies on approaches for attacking text-based CAPTCHAs deployed by different companies, such as Microsoft, Amazon, and Apple, and achieved specific results. However, most of these attacks have shortcomings, such as the poor portability of attack methods, which require a series of data preprocessing steps and rely on large amounts of labeled CAPTCHAs. In this study, we propose an efficient and simple end-to-end attack method based on cycle-consistent generative adversarial networks (Cycle-GANs). Compared to previous studies, our approach significantly reduces the cost of data labeling. Additionally, this method has high portability. It can attack ordinary text-based CAPTCHA schemes only by modifying a few configuration parameters, which makes the attack easier to execute. First, we train CAPTCHA synthesizers based on the Cycle-GAN to generate some fake samples. Basic recognizers based on a convolutional recurrent neural network are trained using the fake data. Subsequently, an active transfer learning method is employed to optimize the basic recognizer utilizing tiny amounts of labeled real-world CAPTCHA samples. Our approach efficiently cracked the CAPTCHA schemes deployed by 10 popular websites, indicating that our attack method may be universal. Additionally, we analyzed the current most popular anti-recognition mechanisms. The results show that the combination of more anti-recognition mechanisms can improve the security of CAPTCHAs. However, the improvement is limited. Conversely, generating more complex CAPTCHAs may cost more resources and reduce the usability of CAPTCHAs.

Full Text