Abstract

Aiming at the problem that the natural scene text recognition algorithm pays too much attention to the local character classification content and ignores the global information content of the entire text, a natural scene text recognition algorithm based on multi-network convergence and multi-head attention mechanism is proposed. Firstly, the algorithm uses a multi-network convergence structure to design multiple residual modules to capture contextual features and semantic features in visual features. Then, in the process of character prediction, a multi-head attention mechanism encoder is proposed, which stitches position information, visual features, context features and semantic features into a new feature space. Finally, the new feature space is reweighted by the self-attention mechanism, which improves the accuracy of predicting text information while paying attention to the connection between feature sequences. The recognition accuracy of SVT and ICDAR2015 on the regular and irregular text datasets reached 91.4% and 82.4%, respectively, which improved by about 1.8% and 2.4% compared with the current popular algorithms. Experimental results show that the model can make better use of position features, global semantic features and context features to more accurately identify text content, and improve the accuracy of the model.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call