Scene text recognition via context modeling for low-quality image in logistics industry

Herui Heng,Peiji Li,Tuxin Guan,Tianyu Yang

doi:10.1007/s40747-022-00916-1

Herui Heng, Peiji Li + Show 2 more

Open Access

https://doi.org/10.1007/s40747-022-00916-1

Copy DOI

Abstract

Text recognition has been applied in many fields recently, such as robot vision, video retrieval, and scene understanding. However, minimal research has been conducted in the field of logistics wherein images of express sheets captured by cameras are mostly curved, distorted, and have low resolution. In this study, a new method is proposed to address the aforementioned research gap while simultaneously considering irregular and low-resolution English letters. The entire approach comprises a rectification module, a convolutional neural network (CNN) extractor, a semantic context module (SCM), a global context module (GCM), and a lightweight transformer decoder that can exhibit improved training speed. In particular, we propose the idea of context modeling in our proposed method. (1) The proposed SCM is introduced to capture full-image dependencies and generates rich semantic context information. (2) We propose the GCM, which not only enhances long-range dependencies from the output of SCM but also outputs abundant pixel information to the self-attention decoder. (3) To solve the low-resolution text recognition problem in a large number of express sheet scenes, we propose Chinese datasets for improving intelligent logistics. Experiments conducted on six public benchmarks demonstrate that the developed method achieves better robustness to low-resolution and irregular text images.

Full Text