Urdu Nasta’liq text recognition system based on multi-dimensional recurrent neural network and statistical features

Saeeda Naz,Riaz Ahmad,Saad B Ahmed,Muhammad I Razzak,Arif I Umar,Syed H Shirazi

doi:10.1007/s00521-015-2051-4

Abstract

Character recognition for cursive script like Arabic, handwritten English and French is a challenging task which becomes more complicated for Urdu Nasta'liq text due to complexity of this script over Arabic. Recurrent neural network (RNN) has proved excellent performance for English, French as well as cursive Arabic script due to sequence learning property. Most of the recent approaches perform segmentation-based character recognition, whereas, due to the complexity of the Nasta'liq script, segmentation error is quite high as compared to Arabic Naskh script. RNN has provided promising results in such scenarios. In this paper, we achieved high accuracy for Urdu Nasta'liq using statistical features and multi-dimensional long short-term memory. We present a robust feature extraction approach that extracts feature based on right-to-left sliding window. Results showed that selected features significantly reduce the label error. For evaluation purposes, we have used Urdu printed text images dataset and compared the proposed approach with the recent work. The system provided 94.97 % recognition accuracy for unconstrained printed Nasta'liq text lines and outperforms the state-of-the-art results.

Full Text