Automatic Labeling for Scene Text Database

Masakazu Iwamura,Masaki Tsukada,Koichi Kise

doi:10.1109/icdar.2013.276

Abstract

It is thought that a large quantity of data improve quality of recognition. A large database, however, is not easy to obtain. The hardest task is labeling (also known as ground truthing), which usually requires human intervention. Since labeling by human is laborious and costly, labeling without human (automatic labeling) or minimization of human intervention (semi-automatic labeling) are ideal scenarios. As a step toward realization of the scenarios, knowing how much an automatic labeling system can perform without human intervention is important. In the current paper we propose a comprehensive automatic labeling technique for a scene text database, which performs segmentation and labeling for unsegmented and unlabeled character images. To our best knowledge, this is the first method to realize the comprehensive process for automatic labeling for scene text databases In experiments, we confirmed that the proposed method could add new unlabeled data in parallel with improving recognition performance of the classifier.

Full Text