Abstract
Background: CAPTCHA is a mechanism to distinguish humans from bots. It has become standard means of protection from the misuse of resources on World Wide Web. Different types of CAPTCHAs are implemented but text-based schemes are the most widely used due to its easiness and robustness. A user is asked to type in the text from an image. The image is intentionally distorted to dodge the bots. Recognizing the text is easy for humans but very hard for computers. Method/Findings: In this work, a text-based CAPTCHA scheme with background clutter and partially connected characters is decoded. The main steps consist on preprocessing, segmentation and recognition. Several digital image processing techniques were applied during preprocessing, segmentation steps and convolutional neural network (CNN) was used for recognition process. Since massive data is required for CNN therefore data was generated synthetically. A complex text-based CAPTCHA scheme with varying number of letters: 3, 4 and 5 letters is decoded with the overall precision of 77.5%, 64.2% and 51.9% respectively. Keywords: CAPTCHAs; HIPs; image processing; machine learning; CNN
Highlights
CAPTCHA (Completely Automated Public Turing test to tell Computers and Humans Apart) is a computer test-program meant to distinguish between a computer and human
The text based CAPTCHAs are most popular because these are easy for most users and provide better security
In text-based CAPTCHA, a user is asked to type in noisy, distorted, string of random characters
Summary
CAPTCHA (Completely Automated Public Turing test to tell Computers and Humans Apart) is a computer test-program meant to distinguish between a computer and human. The bots may send junk e-mails, post unauthorized advertisements and fill servers with heavy traffic These misuses can decrease performance of internet servers. In text-based CAPTCHA, a user is asked to type in noisy, distorted, string of random characters. Distortions are intentionally introduced in text-string to assure protection from bots. CAPTCHA is a mechanism to distinguish humans from bots It has become standard means of protection from the misuse of resources on World Wide Web. Different types of CAPTCHAs are implemented but text-based schemes are the most widely used due to its easiness and robustness. Method/Findings: In this work, a text-based CAPTCHA scheme with background clutter and partially connected characters is decoded. Several digital image processing techniques were applied during preprocessing, segmentation steps and convolutional neural network (CNN) was used for recognition process. A complex text-based CAPTCHA scheme with varying number of letters: 3, 4 and 5 letters is decoded with the overall precision of 77.5%, 64.2% and 51.9% respectively
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.