Abstract

Text-based Captchas are still the most extensively used captcha systems in both business and government institutions, notwithstanding its shortcomings. The more easily a captcha system is automatically solved, the greater is the risk to the website. State-of-the-art pretrained hybrid CNN models (combination of CNN with RNN or CNN with Transformer) have been used with large training datasets for image to text (character) sequence applications such as OCR and text captcha solving. However, it is critical to examine the architectures with low resource settings (i.e. minimal weight parameters) so they can be quickly trained for a fresh dataset using limited hardware resources. Due to the low resource context, the models may be easily adapted and trained from scratch for testing the effectiveness of multiple text-based captcha systems. In our study, we focus on the capability of a single encoder based Transformer network to solving a real-time captcha system in low-resource settings. Here we use manually annotated captcha training samples, small in size. Experimental results exhibit that the non-CNN based Transformer approach outperforms the CNN approach even with fewer model weight parameters. For this study, we concentrate on the real-time captcha systems utilized by five Indian government websites. We train the models from scratch using the corresponding manually annotated real-time datasets to demonstrate the vulnerability of each of the captcha system and to report the performance of the models in low resource circumstances. We observe that, though the importance of the information available on these government websites is enormous, the effort required to solve the captcha systems is modest, indicating a potential risk.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call