Automated analysis of mixed documents consisting of printed Korean/alphanumeric texts and graphic images

Young K Ham

doi:10.1117/12.171323

Young K Ham

https://doi.org/10.1117/12.171323

Copy DOI

Export

Save

Cite

Journal: Optical Engineering	Publication Date: Jun 1, 1994
Citations: 9

Affiliation: Sogang University

Abstract
Full-Text
Similar Papers

Abstract

Listen

An efficient algorithm is proposed that recognizes a mixed document consisting of printed Korean/alphanumeric text and graphic images. In the preprocessing step, an input document is skew-normalized, if necessary, by rotating it by an angle detected with the Hough transform. Then we separate the graphic image parts from the text parts by considering chain codes of connected components. We further separate each character using vertical and horizontal projections. In the recognition step, a mixed text consisting of two different sets of characters, e.g. , Korean and alphanumeric characters is recognized. Korean and alphanumeric characters are classified and each is recognized hierarchically using several effective features. The output is obtained by combining the recognized characters and separated graphic parts. An efficient automated analysis algorithm for mixed documents consisting of graphic images and two different sets of characters is proposed and its performance is demonstrated via computer simulation.

Full Text