Abstract

Optical character recognition (OCR) is a process to make a document image to an editable text. The whole process contains many phases to reach to the final character recognition. One of most important phases of OCR is Segmentation. Segmentation is a phase where document image is fragmented in individual lines, words, and character. Character recognition rate or accuracy largely depends on correctly applied segmentation. An OCR system requires various kinds of segmentation like page segmentation, line segmentation, word segmentation, and character segmentation. This paper proposes a script-independent projection-based approach for line segmentation in medieval handwritten Devnagari manuscripts. Input document is scanned horizontally pixel by pixel, a histogram for each line is created, and line is segmented according to revised local minima. This revised minima makes this technique suitable for Indian scripts which have modifiers (matras) on above and below the characters. Proposed technique is suited for Indian scripts with modifiers on above and below of the character. This paper addresses the segmentation for medieval Devnagari manuscripts which are ages old. These manuscripts are degraded due to age, insects, weather, etc. To clean this image and make it noise-free, it is a big challenge, and noisy image can produce incorrect segmentation. KeywordsSegmentationLine segmentationProjection profileMedieval Devnagari manuscriptImage processingOptical character recognition

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call