Abstract

For a multi script/lingual country like India Script identification is a complex real life problem for automation of document processing. Handwritten script identification is again much more complex compared to print one. Here scripts from multi script handwritten documents are identified and then performance is compared using different well known classifiers. We followed a two stage approach for the same. Firstly, we have identified six scripts used for writing six official languages of India in Handwritten domain, which are easily available to us. Using some Abstract/Mathematical features, Structure based features and Script dependent features at document level a 41 dimensional feature set is prepared. Then, a series of classifiers namely Logistic Model Tree, Random Forest, Multi Layer Perceptron, Sequential Minimal Optimization, LibLINEAR, RBFNetwork and Fuzzy Unordered Rule Induction Algorithm are applied on the feature set to classify among the six handwritten scripts and the results are compared. Among all these classifiers, Logistic Model Tree shows highest accuracy rate of 91.2% with a 5 fold cross validation whereas SMO model has lowest convergence time of 0.05s.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call