Attention-Based Text Recognition in Image

Yiwei Zhu,Quanhai Zhang,Xiang Lin,Shilin Wang,Kai Chen,Zheng Huang

doi:10.1109/dsc.2019.00049

Abstract

Scene text recognition has attracted lots of research interest in computer vision for decades due to its various application. However, it is still a challenging task because of texts appearance variations in term of perspective distortion, text line curvature, text styles as well as font size. Almost all existing state of the art methods adopt the attention-based encoder-decoder framework which uses RNN as main structure. Inspired by the outstanding performance of transformer, which also adoptsencoder-decoder framework but discards the RNN unit, in the field of natural language processing, we develop the recognition network based on transformer (RNBT). And we also modify the loss function to improve the problem that the encoder-decode framework gets bad recognition performance on images that has longer text length than images in training set. The whole network can be trained end-to-end by using only images and image-level annotations. Extensive experiments on various public datasets, including CUTE80, SVT-Perspective, IIIT5K, SVT and ICDAR datasets, show that the proposed method achieves excellent performance on both regular and irregular datasets.

Full Text