Abstract

AbstractIn the realization of a mixed‐mode communication, it is necessary before recognizing individual characters, to separate text and black and white figure regions and to extract efficiently the character region. Problems in such a procedure are the detection and correction of the inclination of a document, separation of contact characters, the merge of disconnected characters, and extraction of a letterhead. This paper describes the results of studies on such problems. The document considered is the English text image, containing binary figures and a letterhead. The basic idea is as follows. Connected regions are obtained by directional propagation and shrinking to merge figures. Then: (1) a thinning process is performed to detect the inclination angle of the input text; (2) the sizes of the connected regions and their relative locations are examined to extract the letterhead. Estimation of pitch is performed and statistical data about individual characters are used to separate contact characters or merge disconnected characters. The experiment was made for 10 and 12 pitch printed characters, and the correct extraction rate was 100 percent for individual characters, and 94.6 percent for a letterhead. Thus it was verified that the proposed method is useful in extracting the character region.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.