Pragmatic degradation learning for scene text image super-resolution with data-training strategy

Shengying Yang,Lifeng Xie,Xiaoxiao Ran,Jingsheng Lei,Xiaohong Qian

doi:10.1016/j.knosys.2023.111349

Abstract

Super-resolution of scene text images represents a formidable computational problem, marred by a myriad of intricate challenges. This paper focuses on the specific hurdles that have impeded significant advancements in this domain, and introduces the Higher-Order Degradation-Based Super-Resolution Network (HDSN) as a novel solution to address these intricate issues. The challenges in super-resolving scene text images are manifold. Firstly, the semantic ambiguity inherent to text in natural scenes often leads to degraded results, as standard super-resolution techniques struggle to preserve meaningful textual content. Additionally, the uncertainty surrounding font variability exacerbates this issue, as different fonts require distinct treatment for optimal super-resolution. Furthermore, scene text images often exhibit long trailing shadows, artifacts, and strong noise, rendering conventional methods inadequate in producing satisfactory results. To tackle these intricate challenges, we propose a pragmatic higher-order degradation modeling process. This process takes into account the nuanced characteristics of scene text images, including the diverse forms of noise such as Gaussian, Poisson, speckle, and JPEG compression noise, as well as varying levels of blurring. By meticulously considering these real-world scenarios, our approach significantly enhances the robustness and adaptability of super-resolution for scene text images. In addition to addressing these challenges, we recognize the issues arising from sparse datasets and the lack of corresponding paired images for training. To surmount this limitation, we introduce a text image pre-training strategy, which proves to be highly effective in improving recognition accuracy. The experimental results on TextZoom affirm the effectiveness of our approach, demonstrating substantial improvements over existing methods. Notably, our HDSN achieves average recognition rates of 67.2% on ASTER, 63.2% on MORAN, and 58.0% on CRNN, surpassing the performance of available approaches. Our source code is available at https://github.com/syyang2022/HDSN.

Full Text