Abstract
In this paper, we propose to apply data augmentation approaches that provide more diverse training images, thus helping train more robust deep models for the Scene Text Recognition (STR) task. The data augmentation methods are Random Blur Region (RBR) and Random Blur Units (RBUs). Specifically, we first introduce RBR designed for the STR task. In training, RBR randomly selects a region and sets the pixels in this region with an average value. However, when RBR provides more various training samples for STR, it may make the samples ambiguous and reduce the recognition accuracy. To address the above problem, we also propose RBUs that divides the blur region into several units. Note that the pixels of one unit share the same value. In this way, RBUs can provide additional readable training samples and help train more robust deep models. Extensive experiments on several STR datasets show that RBUs achieve highly competitive performance. Besides, RBUs are complementary to commonly used data augmentation techniques.
Highlights
Text has rich semantic details and has been utilized in many artificial intelligence applications, such as autonomous driving, travel translator, and image retrieval
This paper focuses on Scene Text Recognition (STR) [1], [2], which has become a critical task in many applications
Based on Random Blur Region (RBR), we propose a more reasonable data augmentation method, named Random Blur Units (RBUs), that can solve the ambiguity problem caused by RBR
Summary
Text has rich semantic details and has been utilized in many artificial intelligence applications, such as autonomous driving, travel translator, and image retrieval. This paper focuses on Scene Text Recognition (STR) [1], [2], which has become a critical task in many applications. Due to various text appearances in the real world and the imperfect conditions where these scenes are captured, most traditional OCR methods fail in STR tasks. To feed more data into the deep networks, one possible way is to manually annotate additional text images. With these labelled images, the STR models are expected to achieve a higher prediction accuracy. The generation process of synthetic data is simple and requires no manual annotations. Synthetic engines can provide text images with various styles and shapes for the STR task
Published Version (Free)
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.