MOSTL: An Accurate Multi-Oriented Scene Text Localization

Fatemeh Naiemi,Vahid Ghods,Hassan Khalesi

doi:10.1007/s00034-021-01674-0

Abstract

Automatic text localization in natural environments is the main element of many applications including self-driving cars, identifying vehicles, and providing scene information to visually impaired people. However, text in the natural and irregular scene has different degrees in orientations, shapes, and colors that make it difficult to detect. In this paper, an accurate multi-oriented scene text localization (MOSTL) is presented to obtain high efficiency of detecting text-based on convolutional neural networks. In the proposed method, an improved ReLU layer (i.ReLU) and an improved inception layer (i.inception) were introduced. Firstly, the proposed structure is used to extract low-level visual features. Then, an extra layer has been used to improve the feature extraction. The i.ReLU and i.inception layers have improved valuable information in text detection. The i.ReLU layers cause to extract some low-level features appropriately. The i.inception layers (specially 3 × 3 convolutions) can obtain broadly varying-sized text more effectively than a linear chain of convolution layer (without inception layers). The output of i.ReLU layers and i.inception layers was fed to an extra layer, which enables MOSTL to detect multi-oriented even curved and vertical texts. We conducted text detection experiments on well-known databases including ICDAR 2019, ICDAR 2017, ICDAR 2015, ICDAR 2003, and MSRA-TD500. MOSTL results yielded performance improvement remarkably.

Full Text