Abstract

AbstractScene text recognition has gained increasing attention in recent years, as it can connect products without an open interface in IoT. The non‐local network is particularly popular in text recognition, as it can aggregate the temporal message of the input. However, existing text recognition methods based on RNN encoder‐decoder structures encounter the problem of attention drift, especially in complex ship name recognition scenarios, because the features extracted by these methods are extremely similar. To address this problem, this paper proposes a novel text recognition approach named Multi Global Context‐aware Transformer (MG‐Cat). The proposed approach has two main properties: (1) a Global Context block that captures the global relationships among pixels inside the encoder, and (2) multiple global context‐aware attention modules stacked in the encoder process. This way, the MG‐Cat approach can learn a more robust intermediate feature representation in the text recognition pipeline. Moreover, the paper collected a new ship name dataset to evaluate the proposed approach. Extensive experiments were conducted on the collected dataset to verify the effectiveness of the proposed approach. The experimental results show the generalization ability of our squeeze‐and‐excitation global context attention module.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.