Abstract
Character segmentation plays a very important role in a text recognition system. The simple technique of using inter-character gap for segmentation is useful for fine printed documents, but this technique fails to give satisfactory results if the input text contains touching characters. In this paper, we have proposed two algorithms to segment touching characters, and one algorithm to segment overlapping lines in degraded printed Gurmukhi document. Various categories of touching characters in different zones, along with their solutions, have been proposed. The solution methodology extensively uses the structural properties of Gurmukhi script. The algorithm proposed for segmenting horizontally overlapping lines uses a heuristics based upon the height of a character. The problem of multiple horizontally overlapping lines may occur in a number of situations such as printed newspapers, old magazines and books etc. Similarity among Indian scripts allows us to use these algorithms for solving the segmentation problems in other Indian languages also.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.