Handwriting-Based Text Line Segmentation from Malayalam Documents

Pearlsy P V,Deepa Sankar

doi:10.3390/app13179712

Abstract

Optical character recognition systems for Malayalam handwritten documents have become an open research area. A major hindrance in this research is the unavailability of a benchmark database. Therefore, a new database of 402 Malayalam handwritten document images and ground truth images of 7535 text lines is developed for the implementation of the proposed technique. This paper proposes a technique for the extraction of text lines from handwritten documents in the Malayalam language, specifically based on the handwriting of the writer. Text lines are extracted based on horizontal and vertical projection values, the size of the handwritten characters, the height of the text lines and the curved nature of the Malayalam alphabet. The proposed technique is able to overcome incorrect segmentation due to the presence of characters written with spaces above or below other characters and the overlapping of lines because of ascenders and descenders. The performance of the proposed method for text line extraction is quantitatively evaluated using the MatchScore value metric and is found to be 85.507%. The recognition accuracy, detection rate and F-measure of the proposed method are found to be 99.39%, 85.5% and 91.92%, respectively. It is experimentally verified that the proposed method outperforms some of the existing language-independent text line extraction algorithms.

Full Text