Использование доменно-состязательного обучения для распознавания текстовых капч

Denis Kushchuk,Alexander Yatskov,Maksim Varlamov,Maxim Ryndin

doi:10.15514/ispras-2020-32(4)-15

Abstract

Nowadays the problem of legal regulation of automatic collection of information from sites is being actively solved. This means that interest in tools and programs for automatic data collection is growing and that's why interest in automatic programs for solving CAPTCHA («Completely Automated Public Turing test to tell Computers and Humans Apart») is increasing too. In spite of сreation of more advanced types of captcha, nowadays text captcha is quite common. For instance, such large services as Yandex, Google, Wikipedia, VK continue to use them. There are many methods of breaking text captchas in literature, however, it should be noted that most of them have a limitation to priori know the length of the text on the image, which is not always the case in the real world. Also, many methods are multi-stage, which brings additional inconvenience to their implementation and application. Moreover, some methods use a large number of labeled pictures for training, but in reality, to collect data one has to be able to solve captchas from different sites. Respectively, manually labeling dozens of thousands of examples for training for each new type of captcha is an unrealistically difficult action. In this paper we propose a one-step algorithm of attack on text captchas. This approach does not require a priori knowledge of the text's length on the image. It has been shown experimentally that the usage of this algorithm in conjunction with the adversarial learning method allows one to achieve high quality on real data, using the low number (200-500) of labeled examples for training. An experimental comparison of the developed method with modern analogs showed that using the same number of real examples for training, our algorithm shows a comparable or higher quality, while it has a higher speed of working and training.

Highlights

Благодаря автоматическому сбору информации исследователи и аналитики получают доступ к большим объемам актуальных данных для работы
In this paper we propose a one-step algorithm of attack on text captchas
An experimental comparison of the developed method with modern analogs showed that using the same number of real examples for training, our algorithm shows a comparable or higher quality, while it has a higher speed of working and training

Summary

Введение

На сегодняшний день в мире происходит процесс правового урегулирования автоматического сбора информации с сайтов, а значит интерес к автоматическим программам решения капчи возрастает. Данные алгоритмы состоят из двух основных фаз: в первой фазе происходит генерация примеров для обучения с использованием большого количества неразмеченных данных и, возможно, небольшого количества размеченных примеров. Этот подход позволяет значительно сократить количество примеров для обучения, что в задаче распознавания капч необходимо, ведь собирать и размечать реальные данные с сайтов достаточно трудоемкий процесс. Использование доменно-состязательного обучения для распознавания текстовых капч. В данной работе мы представляем новый метод, совмещающий названные выше подходы и обладающий такими достоинствами как: простота архитектуры, точность, высокая скорость обучения и работы, требование малого числа примеров для обучения. Во второй части представлен алгоритм DA-CRNN, основной идеей которого является использование состязательного обучения для снижения числа примеров, необходимого для обучения.

Свёрточные нейронные сети

Сегментация и распознавание

Генеративные методы

Обучение представлениями

Описание метода распознавания текста на изображении

Алгоритм DA-CRNN

Набор данных

Результаты модели CRNN на искусственных изображениях

Обучение и подсчеты производились с помощью видеокарты GeForce

Результаты модели DA-CRNN на реальных данных

Заключение

Full Text

Published version (

Free)

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Proceedings of the Institute for System Programming of the RAS	Publication Date: Jan 1, 2020
Citations: 1	License type: cc-by

R Discovery Prime

R Discovery Prime

Использование доменно-состязательного обучения для распознавания текстовых капч

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Proceedings of the Institute for System Programming of the RAS

Lead the way for us

Similar Papers

Completely Automated Captcha Solver
Prof P Y Pawar
International Journal for Research in Applied Science and Engineering Technology | VOL. 9
Prof P Y PawarProf P Y Pawar
20 Jul 2021
International Journal for Research in Applied Science and Engineering Technology | VOL. 9

A Novel Analysis of Advanced Visual Cryptography Techniques for Providing Security Against Web Attacks Using Support Vector Machine Technique
Venkata Satya Vivek Tammineedi ... V.N Rajavarman
Journal of Computational and Theoretical Nanoscience | VOL. 17
Venkata Satya Vivek Tammineedi, et. al.Venkata Satya Vivek Tammineedi ... V.N Rajavarman
01 May 2020
Journal of Computational and Theoretical Nanoscience | VOL. 17

Implementation of Captcha Mechanisms using Deep Learning to Prevent Automated Bot Attacks
Sachin R Sakhare ... Vivek D Patil
Research Journal of Computer Systems and Engineering | VOL. 4
Sachin R Sakhare, et. al.Sachin R Sakhare ... Vivek D Patil
31 Dec 2024
Research Journal of Computer Systems and Engineering | VOL. 4

Selective Learning Confusion Class for Text-Based CAPTCHA Recognition
Xiangyang Luo ... Yingying Liu
IEEE Access | VOL. 7
Xiangyang Luo, et. al.Xiangyang Luo ... Yingying Liu
01 Jan 2019
IEEE Access | VOL. 7

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Использование доменно-состязательного обучения для распознавания текстовых капч

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Proceedings of the Institute for System Programming of the RAS