In spite of various applications of digit, letter and word recognition, only a few studies have dealt with Persian scripts. In this paper, deep neural networks are utilized through different DenseNet and Xception architectures, being further boosted by means of data augmentation and test time augmentation. Dividing the datasets to training, validation and test sets, and utilizing k-fold cross-validation, the comparison of the proposed method with various state-of-the-art alternatives is performed. Three datasets: HODA, Sadri and Iranshahr are used, which offer the most comprehensive collections of samples in terms of handwriting styles and the forms each letter may take depending on its position within a word. On the HODA dataset, we achieve recognition rates of 99.49% and 98.10% for digits and characters, being 99.72%, 89.99% and 98.82% for digits, characters and words from the Sadri dataset, respectively, as well as 98.99% for words from the Iranshahr dataset, each of which outperforms the performances achieved by the most advanced alternative networks, namely ResNet50 and VGG16. An additional contribution of the paper arises from its capability of words recognition as a holistic image classification. This improves the resulting speed and versatility significantly, as it does not require explicit character models, unlike earlier alternatives such as hidden Markov models and convolutional recursive neural networks. In addition, computation times have been compared with alternative state-of-the-art models and better performance has been observed.
Read full abstract