Abstract
Scene text recognition methods of the Encoder–Decoder framework, generally assume that the proportion of characters in the same text instance are basically the same. However, this assumption does not always hold in the context of complex scene images. For adaptively revising the receptive field according to the different font in the scene text image, we propose a Dynamic Receptive Field Adaption Framework which consists of Memory Attention (MA) module and Dynamic Feature Adaptive (DFA) module. MA percepts the historical location information to adapt to the change of character position in the decoder. DFA selects the most distinguishing features from feature maps of different levels dynamically. Additionally, MA and DFA can be easily extended to the existing attention-based and transformer-based text recognition methods to improve their performance. With extension experiments on public benchmark datasets, including IIIT-5K, SVT, SVTP, CUTE80, RECTS, LSVT, and RCTW, our method has shown effectiveness and robustness.
Published Version
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have