Abstract

Abstract To identify machine and human, Completely Automated Public Turing test to tell Computers and Humans Apart (CAPTCHA) is increasingly used in many web applications. The classical English and digital characters based CAPTCHAs are recognized with high accuracy. Due to the complication of Chinese characters which greatly enhance the difficulty of automatic recognition, an increasing number of Chinese web sites use Chinese Character CAPTCHAs. To recognize Chinese Character CAPTCHAs, we propose a Convolution Neural Network (CNN) based approach to learn strokes, radicals and character features of Chinese characters, and prove that our network structure is superior to LENET-5 in this task. Furthermore, we formulate the relation among accuracy, the number of training samples and iterations, which is used to estimate the performance of our approach. Firstly, this approach greatly improves the recognition accuracy of Chinese Character CAPTCHAs with distortion, rotation and background noise. Our experiments results show that this approach achieves over 95% accuracy for single Chinese character and 84% accuracy for three types of Chinese Character CAPTCHAs with four Chinese characters. Secondly, our experiment results and theoretical analysis show that the accuracy of recognition has the exponential relationship with the product of the number of training samples and iterations in the condition of enough and representative training samples. Therefore, we can estimate the training time for a certain accuracy. Finally, we certify that our approach is superior to the most famous Chinese Optical Character Recognition (OCR) software, Hanvon, in Chinese Character CAPTCHAs recognition.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call