Selective Learning Confusion Class for Text-Based CAPTCHA Recognition

Jun Chen,Jinwei Wang,Xiangyang Luo,Yuanyuan Ma,Yingying Liu

doi:10.1109/access.2019.2899044

Abstract

Completely Automated Public Turing test to tell Computers and Humans Apart (CAPTCHA) recognition is one of the most important branches in CAPTCHA research. The existing CAPTCHA recognition methods based on Deep Convolutional Neural Network (DCNN) have low recognition accuracy in confusion class. To solve this problem, a novel method of selective learning confusion class for text-based CAPTCHA recognition is presented. First, a frame with two-stage DCNN is proposed, which integrates all-class DCNN and confusion-class DCNN. Second, a confusion relation matrix is constructed to show confusion relations among classes, which can be used to analyze the output of all-class DCNN. Third, a set partition algorithm is presented, which can be used to divide a confusion class set into multiple subsets, each one corresponding to a new confusion-class DCNN. Fourth, with a view on improving the recognition accuracy of the confusing characters in confusion-class DCNN, training and validating interactive learning algorithm is proposed. Lastly, the outputs of the two stages were combined as the final recognition result. The experimental results based on real CAPTCHA data sets demonstrate that, compared with the four state-of-the-art attacks, the proposed method could effectively improve text-based CAPTCHA recognition accuracy by 1.4%-39.4%.

Full Text