Abstract

In multilingual countries like India, majority of the documents may contain text information in more than one script/language forms. For automatic processing of such documents through optical character recognition (OCR), it is necessary to design multilingual OCR. With reference to Karnataka state, this paper proposed handwritten Kannada and English character recognition system. The proposed zone based pixel density features are employed for classification of Kannada and English characters. A total of 6,000 handwritten Kannada and English sample images are used for classification. The character images are normalised into 32 × 32 dimensions. Then the normalised images are divided into 64 zones and their pixel densities are calculated and generated a total of 64 features. Further, these features are fed to KNN and SVM classifiers for recognition of the said characters. To measure the performance of the classifiers two-fold cross validation is employed. The proposed algorithm classifies Kannada numerals, vowels and English numerals, uppercase alphabets independently and in combination of these. The average recognition accuracy of 89.21% with KNN and 93.22% with SVM classifiers are achieved. The novelty of the proposed algorithm is free from characters thinning and slants of the characters.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call