Abstract

In this paper, Delaunay triangulation is applied for the extraction of text areas in a document image. By representing the location of connected components in a document image with their centroids, the page structure is described as a set of points in two-dimensional space. When imposing Delaunay triangulation on these points, the text regions in the Delaunay triangulation will have distinguishing triangular features from image and drawing regions. For analysis, the Delaunay triangles are divided into four classes. The study reveals that specific triangles in text areas can be clustered together and identified as text body. Using this method, text regions in a document image containing fragments can also be recognized accurately. Experiments show the method is also very efficient.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.