Abstract

Scene text with an irregular layout is difficult to recognize. To this end, a Sequential Transformation Attention-based Network (STAN), which comprises a sequential transformation network and an attention-based recognition network, is proposed for general scene text recognition. The sequential transformation network rectifies irregular text by decomposing the task into a series of patch-wise basic transformations, followed by a grid projection submodule to smooth the junction between neighboring patches. The entire rectification process is able to be trained in an end-to-end weakly supervised manner, requiring only images and their corresponding groundtruth text. Based on the rectified images, an attention-based recognition network is employed to predict a character sequence. Experiments on several benchmarks demonstrate the state-of-the-art performance of STAN on both regular and irregular text.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.