Abstract

Machine-printed and handwritten texts always intermixedly appear in several kinds of documents, such as form documents. The classification of machine-printed and handwritten texts is thus a prerequisite to facilitate later optical character recognition task. In this paper, we will present a machine-printed and handwritten text classification method to automatically identify the identity of texts segmented from a document image. In our approach, the orientation of a text block is first divided into horizontal or vertical direction by analyzing the widths of valleys of X and Y projection profiles of a text block image. Then, a reduced X– Y cut algorithm is utilized to obtain the base blocks from a text block image. Last, the spatial feature, character block layout variance, is devised to achieve the classification goal. Our method can be applied to either English or Chinese document images. Experimental results reveal the feasibility of our proposed method in classifying handwritten and machine-printed texts.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.