A Robust Ensemble of ResNets for Character Level End-to-end Text Detection in Natural Scene Images

Yoonhyung Kim,Jinsu Kim,Changick Kim

doi:10.1145/3095713.3095724

Abstract

Detecting text in natural scene images is a challenging task. In this paper, we propose a character-level end-to-end text detection algorithm in natural scene images. In general, text detection tasks are categorized into three parts: text localization, text segmentation, and text recognition. The proposed method aims not only to localize but also to recognize text. To do these tasks successfully, the proposed method consists of four steps: character candidate patch extraction, patch classification using ensemble of ResNets, non-character region elimination, and character region grouping via self-tuning spectral clustering. In the character candidate patch extraction step, character candidate patches are extracted from the image by using both edge information from multi-scale images and Maximally Stable Extremal Regions (MSERs). Then each patch is classified into either character patch or non-character patch by using the deep network that is composed of three ResNets with different hyper-parameters. Text regions are determined by filtering out non-character patches. In order to make further reduction of classification errors, character characteristics are employed to compensate classification results of the ensemble of ResNets. To evaluate the text detection performance, character regions are grouped via self-tuning spectral clustering. The proposed method shows competitive performance on the ICDAR 2013 dataset.

Full Text