Abstract

AbstractModeling accurate text recognizers for scanned Devanagari scripts is a challenging task, primarily due to the ingrained complexities of characters, vowels, conjuncts, and modifiers in the word formation of the script. This paper presents an attention-based convolutional recurrent neural network (CRNN) for improving the text recognition accuracy of the Devanagari script images. The proposed model is an encoder–decoder network with an attention mechanism. The encoder is a convolutional neural network (CNN), which extracts the significant and high-level features from the input images. The encoder also generates the extracted features in an ordered sequence. The decoder uses a bidirectional long short-term (BLSTM) network with an attention mechanism, followed by a connectionist temporal classification (CTC) layer to predict the text in images. The attention mechanism enables the decoder to selectively exploit the high-level features of the encoder congruently. We have trained, validated, and tested the performance of the proposed model on three datasets, viz. publicly available IIIT-H scene image dataset, synthetic dataset, and real dataset curated during this work. We have also compared the proposed model results with work already attempted in the area for the three datasets. We found that the performance of our proposed model outdoes the outcomes obtained from other models on the three datasets.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call