Abstract

In this paper, we proposed a novel method for text line segmentation of Tibetan historical document image with uchen script based on contour tracking. Our method is mainly to segment the text lines from the image documents using the contour curve of the text lines, which consists of three parts: First, we calculate the barycentre coordinates of the connected components for the text regions, and then the barycentre of each text line is connected in order, so that the main part of each text line is connected and a new connected component is formed; then the contour curve of the connected component is obtained using the contour tracing algorithm; Second, the contour curve and the barycentre gravity are used to assign key elements (such as the syllable point, the upper vowel, the lower vowel, and the broken strokes and so on) of the text lines, and next the candidate text lines are obtained based on these connected components; Finally, the contour tracking algorithm is used to calculate the contour curve of the candidate text lines and segment the text lines. We evaluated our text line segmentation method on the 200 document image data sets. Experimental results show that the proposed method based on contour curve tracing can accurately segment the text lines of image documents and achieve the encouraging results.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.