Abstract

Recognition of multi-script documents, both printed and handwritten, is still a challenge due to the script dependence of OCR. Identification of script is a significant process in design of multi-script OCR system for processing of multi-script documents. In this paper, we focus on wordwise script identification, as without surprise we can see many scripts mixed in single line. We present a method, which mainly comprises three steps—word extraction, feature computation, and classification. Using morphological dilation, words are extracted. Radon and wavelet transforms are employed to extract the features based on directional and multi-resolution analysis. In classification, performance of LDA, SVM, and KNN classifiers is studied separately. Experiments with our dataset of Kannada and Roman words show that the presented method is robust for wordwise handwritten script identification.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call