Abstract
This paper presents a novel and simple method for extracting individual lines from handwritten Malayalam documents.The challenge involved in text line extraction of handwritten document is segmentation of touching lines. As far as Malayalam language is considered, symbols like chandrakkala will be classified into separate line due to the small gap between the Malayalam alphabet and the symbol chandrakkala. This paper addresses the possibility of touching lines and misclassification of character like chandrakkala into a separate line. In the proposed method, the scanned handwritten document is divided into vertical stripes. Using horizontal projection method lines are extracted in each vertical stripe separately. Touching lines, segmentation of character like chandrakkala into separate line and extra lines due to noise are addressed using the median values of the height of lines in each vertical stripe separately. The handwritten document image is divided into vertical stripes prior to line segmentation to account for the possibility of skewed lines. When the document is divided into vertical stripe, the characters will be cut in between. This paper also presents a solution to join the characters cut in between when the document is divided into vertical stripes. This is done by compensating for the distance of characters from the top of the line at the joining edge of the vertical stripe.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.