Abstract

In this paper, a novel dual-pathway-fusion-based sequence-to-sequence learning model (DPF-S2S) is proposed for text recognition in the wild, which mainly focuses on enriching the spatial information and extracting high-dimensional representation features to assist decoding. In particular, a double alignment module is developed to solve the problem of text misalignment, where both position and vision information are well considered. Moreover, a global fusion module is deployed to enrich 2D information in the aligned attention maps, which benefits accurate recognition from complicated scenes with arbitrary text shapes and poor imaging conditions. Benchmark evaluations on seven datasets have demonstrated the superiority of proposed DPF-S2S model in comparison to other state-of-the-art text recognition methods, which presents great competitiveness on identifying texts in both regular and irregular scenes. In addition, extensive ablation studies have been carried out, which validate the effectiveness of applied strategies in proposed DPF-S2S.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.