Abstract

Background: CAPTCHA is a mechanism to distinguish humans from bots. It has become standard means of protection from the misuse of resources on World Wide Web. Different types of CAPTCHAs are implemented but text-based schemes are the most widely used due to its easiness and robustness. A user is asked to type in the text from an image. The image is intentionally distorted to dodge the bots. Recognizing the text is easy for humans but very hard for computers. Method/Findings: In this work, a text-based CAPTCHA scheme with background clutter and partially connected characters is decoded. The main steps consist on preprocessing, segmentation and recognition. Several digital image processing techniques were applied during preprocessing, segmentation steps and convolutional neural network (CNN) was used for recognition process. Since massive data is required for CNN therefore data was generated synthetically. A complex text-based CAPTCHA scheme with varying number of letters: 3, 4 and 5 letters is decoded with the overall precision of 77.5%, 64.2% and 51.9% respectively. Keywords: CAPTCHAs; HIPs; image processing; machine learning; CNN

Highlights

  • CAPTCHA (Completely Automated Public Turing test to tell Computers and Humans Apart) is a computer test-program meant to distinguish between a computer and human

  • The text based CAPTCHAs are most popular because these are easy for most users and provide better security

  • In text-based CAPTCHA, a user is asked to type in noisy, distorted, string of random characters

Read more

Summary

Introduction

CAPTCHA (Completely Automated Public Turing test to tell Computers and Humans Apart) is a computer test-program meant to distinguish between a computer and human. The bots may send junk e-mails, post unauthorized advertisements and fill servers with heavy traffic These misuses can decrease performance of internet servers. In text-based CAPTCHA, a user is asked to type in noisy, distorted, string of random characters. Distortions are intentionally introduced in text-string to assure protection from bots. CAPTCHA is a mechanism to distinguish humans from bots It has become standard means of protection from the misuse of resources on World Wide Web. Different types of CAPTCHAs are implemented but text-based schemes are the most widely used due to its easiness and robustness. Method/Findings: In this work, a text-based CAPTCHA scheme with background clutter and partially connected characters is decoded. Several digital image processing techniques were applied during preprocessing, segmentation steps and convolutional neural network (CNN) was used for recognition process. A complex text-based CAPTCHA scheme with varying number of letters: 3, 4 and 5 letters is decoded with the overall precision of 77.5%, 64.2% and 51.9% respectively

Methods
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.