Abstract

In this study, a novel technique is proposed to recognize printed text in images for Urdu, a low-resource language with a scarcity of benchmark datasets. The proposed technique is called Efficient CRNN which uses depthwise separable convolutional and bidirectional gated recurrent unit layers, followed by connectionist temporal classification loss. The proposed technique is computationally more efficient than the existing text recognition techniques, requiring fewer parameters and computations. A multi-font printed Urdu text lines corpus is also presented, consisting of 245,000 text line images rendered using 7 different fonts. The corpus is called the MMU-Extension-22 and is used to train and evaluate existing state-of-the-art end-to-end text recognition techniques. Efficient CRNN is also evaluated using the proposed corpus. The proposed technique is first trained using a total of 196,000 text line images and then tested using 49,000 images. The Efficient CRNN technique achieved the minimum character and word error rates of 0.91% and 1.49% respectively for Urdu text line recognition under different settings, outperforming the existing computationally more complex techniques. The simple nature of the proposed technique not only makes it more efficient but also more robust for Urdu text line recognition, achieving a 2.23% reduced character error rate and a 71%11Percentage Decrease = 100 * (Baseline Value - Changed Value)/Baseline Value. decrease in character error rate as compared to the best performing existing Recurrent Neural Networks based technique. Also, the proposed technique outperforms Vision Transformer-based network achieving a 0.79% reduced character error rate accounting for a 41% decrease in error. Also, the Efficient CRNN has 49.16% reduced parameters compared to the baseline Vision Transformer technique.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call