Abstract

Today, there is an increasing demand of efficient archival and retrieval methods for online handwritten data. For such tasks, text categorization is of particular interest. The textual data available in online documents can be extracted through online handwriting recognition; however, this process produces errors in the resulting text. This work reports experiments on the categorization of online handwritten documents based on their textual contents. We analyze the effect of word recognition errors on the categorization performances, by comparing the performances of a categorization system with the texts obtained through online handwriting recognition and the same texts available as ground truth. Two well-known categorization algorithms (kNN and SVM) are compared in this work. A subset of the Reuters-21578 corpus consisting of more than 2,000 handwritten documents has been collected for this study. Results show that classification rate loss is not significant, and precision loss is only significant for recall values of 60–80% depending on the noise levels.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call