Abstract

Irregular text has widespread applications in multiple areas. Different from regular text, irregular text is difficult to recognize because of its various shapes and distorted patterns. In this paper, we develop a multidirectional convolutional neural network (MCN) to extract four direction features to fully describe the textual information. Meanwhile, the character placement possibility is extracted as the weight of the four direction features. Based on these works, we propose the encoder to fuse the four direction features for the generation of feature code to predict the character sequence. The whole network is end-to-end trainable due to using images and word-level labels. The experiments on standard benchmarks, including the IIIT-5K, SVT, CUTE80, and ICDAR datasets, demonstrate the superiority of the proposed method on both regular and irregular datasets. The developed method shows an increase of 1.2% in the CUTE80 dataset and 1.5% in the SVT dataset, and has fewer parameters than most existing methods.

Highlights

  • AAbbssttrraacctt:: IIrrrreegguullaar tteexxt hhaass wwiiddeesspprreeaad aappplications in multiple areas

  • Each image is associated with a 50-word lexicon defined by Wang et al [17]

  • We present a novel neural network for scene text recognition by (1) designing BCN and MCN to extract four direction features and the corresponding character placement possibility, (2) using an encoder to integrate the four direction features and their placement possibility and gain a feature code, and (3) applying a decoder to recognize the character sequence

Read more

Summary

Text Recognition

A large number of papers related to text recognition have been published. The model can only recognize 90k words Many recent works, such as [2,4,6,7,10,20] make use of a recurrent neural network (RNN). The authors of [6,20] proposed an end-to-end neural network, and regarded the natural images recognition as a sequence labeling problem. The model consists of two blocks: recursive CNN for image feature extraction, and RNN with attention-based mechanism for recognition. The method in [2] develops a focusing network (FN) to eliminate the attention drift of the attention network (AN) These methods transform the image into a feature sequence from left to right and cannot recognize irregular text (e.g., curved text) in natural images. We can only use word-level labels to train the whole network end-to-end

BiLSTM
Experiments
Details
Comparative Evaluation
Method
Method Method
Comparison with Baseline Models
Findings
Conclusions
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call