Soft Clustering for Segmenting Touching Characters in Printed Scripts

Keshab Nath,Swarup Roy

doi:10.1007/978-981-13-1906-8_30

Abstract

Segmentation of characters from the printed script is an important preprocessing step in automatic Optical Character Recognition (OCR). The performances of the various machine learning algorithms depend on the results of segmentation of the characters. The situation is more challenging when the scripts contain touching characters. Touching characters are predominant in different Indian scripts like Assamese, Bangla, Devanagari, Oriya, Gurmukhi, and many others. In such cases, the accuracy of an OCR system depends on the quality of segmentation of touching characters. In this paper, we explore the effectiveness of fuzzy, rough, and rough fuzzy k-means clustering to segment touching characters. We use different compound characters dataset from Devanagari, Assamese, and Bangla printed scripts for experimentation. Our results reveal that soft k-means are an effective alternative method for segmenting touching characters.

Full Text