Abstract

Graphical designs are often used in Japanese newspaper headlines to indicate hot articles. However, conventional OCR software seldom recognizes characters in such headlines because of the difficulty of removing the designs. This paper proposes a method that recognizes these characters without needing removal of the graphical designs. First, the number of text-line regions and the averaged character heights are roughly extracted from the local distribution of the black and white runs observed in a rectangular window while the window is shifted pixel- by-pixel along the direction of the text-line. Next, normalized text-line regions are yielded by normalizing their heights to the height of binary reference patterns in a dictionary. Next, displacement matching is applied to the normalized text-line region for character recognition. A square window at each position is matched against binary reference patterns while being shifted pixel-by-pixel along the direction of the text-line. The complementary similarity measure, which is robust against graphical designs, is used as a discriminant function. When the maximum similarity value at each position exceeds the threshold, which is automatically determined from the degree of degradation in the square window, the character category of this similarity value is specified as a recognized category. Experimental results for fifty Japanese newspaper headlines show that the method achieves recognition rates of over 90%, much higher than a conventional method (17%).

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.