Defeating line-noise CAPTCHAs with multiple quadratic snakes

Yoichi Nakaguro,Matthew N Dailey,Sanparith Marukatat,Stanislav S Makhanov

doi:10.1016/j.cose.2013.05.003

Abstract

Optical character recognition (OCR) is one of the fundamental problems in artificial intelligence and image processing, but recent progress in OCR represents a security challenge for Web sites that throttle requests with image based CAPTCHAs (Completely Automated Public Turing Tests to Tell Computers and Humans Apart). A CAPTCHA is challenge-response test placed within web forms to determine whether the user is human. Unfortunately, algorithms capable of solving image based CAPTCHAs can be used to create spam accounts and design malicious denial of service (DoS) attacks, causing financial and social damage. The problem of defeating digital image CAPTCHAs is thus twofold. On the one hand, it is an important problem in artificial intelligence and image processing. On the other hand, publicly available CAPTCHAs that are not tested against state of the art machine recognition algorithms may make the systems vulnerable to attack by software bots.This paper considers a very important subclass of text CAPTCHAs, those characterized by salt and pepper noise combined with line (curve) noise. Thus far, attacks on CAPTCHAs with this type of noise have used relatively simple image processing methods with some success, but state-of-the-art segmentation methods have not been fully exploited. In this paper, we propose and benchmark two strong segmentation methods. The first method is a modification of a multiple quadratic snake proposed for road extraction from satellite images. The second competing method is a boundary tracing routine available in the OpenCV open source library.A first numerical experiment indicates excellent accuracy for both methods. A second experiment on human recognition shows that the CAPTCHAs used in the study are already near the threshold of being too hard for humans. Finally, a third numerical experiment presents a more difficult set of CAPTCHAs with the addition of anti-binarization methods. The snake-based method is shown to be more resilient to anti-binarization schemes than boundary tracing and state-of-the art projection-based attacks on CAPTCHAs.Since CAPTCHAs corrupted by small line noise are shown to be difficult for humans and relatively easy for our algorithm, CAPTCHA designers should introduce more challenging distortions into their CAPTCHAs, lest the security of systems based on them be compromised.

Full Text