MLTS: A Multi-Language Scene Text Spotter

Yu Zhou,Yongdong Zhang,Hongtao Xie,Zheng-Jun Zha,Shancheng Fang

doi:10.1109/icme.2019.00036

Abstract

Scene text detection and recognition are popular research topics in computer vision due to its various applications such as autonomous driving, blind assistance and text translation. However, many methods currently can only detect or recog-nize the text of one language. In scene text images, we can often see text in multi-language appearing on the same image. However, there is no valid model for multi-language text spotting. In this paper, an end-to-end method for multi-language scene text detection, recognition and script identification is proposed. The method, called MLTS, is an abbreviation of a Multi-Language Scene Text Spotter. By designing a special backbone for text and combining two different kinds of attention. MLTS achieves state-of-the-art performance for both joint localization and script identification in natural images and in cropped word script identification, the precision, recall and F-measure are 0.7145, 0.6583 and 0.6852 respectively, while the corresponding values of the best existing methods are 0.5759, 0.6207, 0.5974 respectively. Additionally, our MLTS achieves comparable performance on ICDAR2013 and ICDAR2015, which proves the effectiveness of the model.

Full Text