Abstract

In the process of optical character recognition (OCR), segmentation is always a crucial phase. Here, segmentation refers to all types of segmentation—page segmentation, line segmentation, word segmentation and character segmentation. The character recognition rate of any OCR system is largely depending on correct and accurate segmentation. This paper addresses the character segmentation for medieval handwritten Devnagari manuscripts. These manuscripts are hundreds of years old. In recent Devnagari, shirorekha (upper horizontal line) is placed on each word; whereas in medieval Devnagari, a separate shirorekha is placed on each character. Using this unique feature as a key, a novel Shirorekha Based Character Segmentation (SBCS) method is proposed. In this technique, first the shirorekha is identified to separate characters. The shirorekha is examined horizontally to find breaks in it. Wherever there is a break in shirorekha, it is assumed to be a possible segmentation point for a character. Thereafter, possible segmentation points are scanned for vertically spacing between two characters. According to the gap between characters, the segmentation points are finalized. Using this approach, segmentation accuracy achieved is 88.28%. This accuracy is better as compared to many existing approaches applied on recent Devnagari script. As per our knowledge no research work for character segmentation for medieval Devnagari script is found. This is the first attempt of its kind.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call