Abstract
An optical character reader for processing typeset documents must be able to handle proportional spacing, the presence of touching characters and a wide variety of type fonts. This paper describes the design of a multifont character recognizer which uses a binary decision tree to classify a character on the basis of 197 geometric features. The algorithm for designing the decision tree is based upon an entropy minimization procedure, and makes no assumptions on the distribution or independence of the binary features. The decision tree classifier provides confidence measures which may be used to reduce the substitution error rate at the expense of higher rejection rates. Methods of reducing the overall error rate by combining the decision tree classifier with other classifiers were examined. In particular, the paper evaluates the performance of a classifier using a combination of multiple decision trees, template matching and contextual post-processing. Error rates were highly sensitive to typeface and varied between 10 percent and 0.1 percent. Computer processing times for the various stages of the system are presented.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
More From: International Journal of Pattern Recognition and Artificial Intelligence
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.