Online Handwritten Gurmukhi Strokes Dataset Based on Minimal Set of Words

Sukhdeep Singh,Indu Chhabra,Anuj Sharma

doi:10.1145/2896318

Abstract

The online handwriting data are an integral part of data analysis and classification research, as collected handwritten data offers many challenges to group handwritten stroke classes. The present work has been done for grouping handwritten strokes from the Indic script Gurmukhi. Gurmukhi is the script of the popular and widely spoken language Punjabi. The present work includes development of the dataset of Gurmukhi words in the context of online handwriting recognition for real-life use applications, such as maps navigation. We have collected the data of 100 writers from the largest cities in the Punjab region. The writers’ variations, such as writing skill level (beginner, moderate, and expert), gender, right or left handedness, and their adaptability to digital handwriting, have been considered in dataset development. We have introduced a novel technique to form handwritten stroke classes based on a limited set of words. The presence of all alphabets including vowels of Gurmukhi script has been considered before selection of a word. The developed dataset includes 39,411 strokes from handwritten words and forms 72 classes of strokes after using a k-means clustering technique and manual verification through expert and moderate writers. We have achieved recognition results using the Hidden Markov Model as 87.10%, 85.43%, and 84.33% for middle zone strokes when using training data as 66%, 50%, and 80% of the developed dataset. The present work is a step in a direction to find groups for unknown handwriting strokes with reasonably higher levels of accuracy.

Full Text