Abstract
This paper makes an attempt to segment the handwritten Hindi words. The problem of segmentation is compounded by the possible presence of modifiers known as matras on all sides of the basic characters and due to the uncertainty introduced in the character shapes by way of different writing styles. We have devised a structural approach to capture the similarities and differences between structure classes. The segmentation is performed in hierarchical order: 1) Separating the upper modifiers and header line from the character, 2) Detecting and then segmenting lower modifiers from the characters, 3) Identifying whether a character is conjunct or not, 4) Categorization of top modifiers based on Check_point, Mid_point and Touching_points. The segmentation accuracy has been found to be around 78%. Some general conditions are applied for separating modifiers from the characters. But certain words cannot be segmented because they violate the general conditions. However, specifics are not dealt with in this paper because such an attempt requires an exhaustive study on a large database that is not available presently.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
More From: International Journal of Computer Processing of Languages
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.