Abstract

Urdu language uses cursive script which results in connected characters constituting ligatures. For identifying characters within ligatures of different scales (font sizes), Convolution Neural Network (CNN) and Long Short Term Memory (LSTM) Network are used. Both network models are trained on formerly extracted ligature thickness graphs, from which models extract Meta features. These thickness graphs provide consistent information across different font sizes. LSTM and CNN are also trained on raw images to compare performance on both forms of inputs. For this research, two corpora, i.e. Urdu Printed Text Images (UPTI) and Centre for Language Engineering (CLE) Text Images are used. Overall performance of networks ranges between 90% and 99.8%. Average accuracy on Meta features is 98.08% while using raw images, 97.07% average accuracy is achieved.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.