Abstract

This paper describes the structure of an optical character recognition (OCR) system for printed documents. This system is trained for Latin and Greek typewritten text, but it can be easily adapted to any typewritten character set. The proposed method is divided into two main stages. In the first stage suitable binary features are extracted, most of which are independent of the scaling and rotation of the characters. After that, a binary tree classification technique is used, and an optimal tree classifier is constructed. In the second stage, the characters at the end-nodes of the binary tree are classified by using a new template-matching technique. By setting a suitable threshold for the matching, a decision can be reached for the greatest part of the characters. For those characters that the binary tree cannot recognize with great confidence, a secondary minimum distance, classifier trained with the Zernike moments of the characters, is used. Experimental results show that the performance of the proposed OCR system is high, and the recognition rate can exceed 99.5%.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.