Abstract

End-to-end Frameworks with Connectionist Temporal Classification (CTC) have achieved great success in text recognition. Despite high accuracies with deep learning, CTC-based text recognition methods also suffer from poor alignment (character boundary positioning) in many applications. To address this issue, we propose an end-to-end text recognition method based on robust prototype learning. In the new CTC framework, we formulate the blank as the rejection of character classes and use the one-vs-all prototype classifier as the output layer of the convolutional neural network. For network learning, based on forced alignment between frames and character labels, the most aligned frame is up-weighted in CTC training strategy to reduce estimation errors in decoding. Experiments of handwritten text recognition on four benchmark datasets of different languages show that the proposed method consistently improves the accuracy and alignment of CTC-based text recognition baseline.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call