Rethinking text rectification for scene text recognition

Wenjun Ke,Jianguo Wei,Qingzhi Hou,Hui Feng

doi:10.1016/j.eswa.2023.119647

Abstract

Existing scene text recognition methods have incorporated text rectification to lessen text irregularity in images for accurate recognition. Previous text rectification methods aim to convert an irregular text image into a regular form, making it easier to be recognized. In this study, we explore text rectification for text recognition and discover the issues: performance degradation of the recognition network and the unreliable situation of text rectification, which are ignored by all previous works. Therefore, we rethink what is causing two issues, and propose a rectification-based text recognition network to mitigate the above issues. The proposed network consists of text rectification and text recognition, and designs a multi-level feature aggregation module to enhance feature learning for character representation. Concretely, we devise a mixed batch training strategy to address the performance degradation of the recognition network, and design a confidence decoding scheme to avoid the unreliable situation of text rectification. Extensive ablation studies verified the positive role of the feature aggregation module in feature learning and the effectiveness of the proposed training strategy and decoding scheme in addressing the issues. Experimental results outperform the state-of-the-art results on public benchmarks.

Full Text