Abstract
Motivation : The problem of handwritten text recognition is vastly studied since last few decades. Many innovative ideas have been developed, where state-of-the-art accuracy is achieved for the English, Chinese or Indian scripts.The recent developments for the cursive scripts such as Arabic and Urdu handwritten text recognition have achieved remarkable accuracy. However, for the Sindhi script, existing systems have not shown significant results and the problem is still an open challenge. Several challenges such as variations in writing styles, joined text, ligature overlapping, and others associated to the handwritten Sindhi text make the problem more complex. Objectives: In this study, a deep residual network with shortcut connections and summation fusion method using convolutional neural network (CNN) is proposed for automatic feature extraction and classification of handwritten Sindhi characters. Method: To increase the powerful feature representation ability of the network, the features of the convolutional layers in the residual block are fused together and combined with the output of the previous residual block. The proposed network is trained on a custom developed handwritten Sindhi character dataset. To tackle the problem of small data, a data augmentation with rotation, flipping and image enhancement techniques have been used. Findings: The experimental results show that the proposed model outperforms than the best results previously published for the handwritten Sindhi character recognition. Novelty: This is the first research that proposes deep residual network with summation fusion for the Sindhi handwritten text recognition. Keywords: Handwritten Sindhi character recognition; Sindhi text recognition; cursive text recognition; deep learning; ResNet; convolutional neural network
Highlights
Despite advances in the offline and online document text recognition, Sindhi handwritten text recognition still remains an unsolved problem
This shows that the residual blocks with summation fusion outperform than standard residual blocks
A large portion of research has been carried for handwritten text recognition, where state-of-the-art accuracy is achieved for Latin, Indian, Chinse and Arabic scripts
Summary
Despite advances in the offline and online document text recognition, Sindhi handwritten text recognition still remains an unsolved problem. The Sindhi is one of the ancient Indo-Aryan language and is spoken by more than forty million people in the Sindh province, Pakistan and some states of India [1]. It is a type of bidirectional cursive script, where the text is written in the right to left direction and the numerals are written in the left to right direction. The letter with green color is present in the Arabic and Persian scripts, it has completely different meaning, context and produces different sound when used in the Sindhi script. A detailed review of issues and challenges associated to the handwritten Sindhi text recognition is presented in [3]
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.