Text-Enhanced Scene Image Super-Resolution via Stroke Mask and Orthogonal Attention

Rui Shu,Shuyang Feng,Cairong Zhao,Liang Zhu,Duoqian Miao

doi:10.1109/tcsvt.2023.3267133

Rui Shu, Shuyang Feng + Show 3 more

https://doi.org/10.1109/tcsvt.2023.3267133

Copy DOI

Export

Save

Cite

Abstract
Full-Text
Similar Papers

Abstract

Listen

Low-resolution text images are very commonplace in real life and their information is hard to be extracted by using existing text recognition methods only. Although this problem can be solved by introducing super-resolution (SR) techniques, most existing SR methods fail to process stroke regions and background regions of input text images distinctively. In this paper, we propose a text-specific super-resolution network named Text Enhanced Attention Network (TEAN) to solve this problem. First of all, we compensate for disadvantages of traditional thresholding mask operation proposed in Text Super-Resolution Network (TSRN) by utilizing deep-learning based semantic segmentation method to get correct masks as prior semantic information and propose a Text-Segmented-Contextual-Attention (TSCA) branch on the basis of them. Besides, we design an Orthogonal Contextual Attention Module (OCAM) working with TSCA to implicitly enhance stroke regions of LR images. Secondly, to effectively fuse shallow features and deep features of SR model, we propose a convolutional structure named Weight Balanced Fusion Module (WBFM) to improve traditional feature fusion methods of SR network. Finally, extensive experiments on TextZoom dataset demonstrate that the proposed network can improve the recognition accuracy of text images on existing text recognition models. Using TEAN to process low-resolution text images improves the recognition accuracy by 25.4% on CRNN, by 17.4% on ASTER, by 17.3% on MORAN, by 20.7% on NRTR, by 17.3% on SAR and by 15.9% on MASTER compared with directly recognizing them, which attains competitive performances against state-of-the-art methods. Furthermore, cross-dataset experiments on IC15_2077 demonstrate that TEAN is helpful for scene text recognition task, especially for low-resolution images even with the cross-domain issue.

Full Text