Abstract
End-to-end text spotting aims to combine scene text detection and recognition into a unified framework. Dealing with the relationship between the two sub-tasks plays a pivotal role in designing effective spotters. While polygon or segmentation-based methods eliminate heuristic post-processing, they still face challenges such as background noise and high computational burden. In this study, we introduce DPGS, a coarse-to-fine learning framework that lets Dynamic Points Generation for text Spotting. DPGS simultaneously learns character representations for both detection and recognition tasks. Specifically, for each text instance, we represent the character sequence as ordered points and model them with learnable point queries. This approach progressively selects appropriate key points covering character and leverages group attention to associate similar information from different positions, improving detection accuracy. After passing through a single decoder, the point queries encode text semantics and locations, facilitating decoding to central line, boundary, script, and confidence of text through simple prediction heads. Additionally, we introduce an adaptive cooperative criterion to combine more useful feature knowledge, enhancing training efficiency. Extensive experiments show the superiority of our DPGS when handling scene text detection and recognition tasks. Compared to their respective top-1 methods, DPGS has significantly improved the average recognition accuracy by 3.7%, 1.9%, and 0.7% on the Total-Text, ICDAR15, and CTW1500 datasets, respectively.
Published Version
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.