Abstract

An algorithmic architecture for a high-performance optical character recognition (OCR) system for hand-printed and handwritten addresses is proposed. The architecture integrates syntactic and contextual post-processing with character recognition to optimise postcode recognition performance, and verifies the postcode against simple features extracted from the remainder of the address to ensure a low error rate. An enhanced version of the characteristic loci character recognition algorithm was chosen for the system to make it tolerant of variations in writing style. Feature selection for the classifier is performed automatically using the B/W algorithm. Syntactic and contextual information for hand-printed British postcodes have been integrated into the system by combining low-level postcode syntax information with a dictionary trie structure. A full implementation of the postcode dictionary trie is described. Features which define the town name effectively, and can easily be extracted from a handwritten or hand-printed town name are used for postcode verification. A database totalling 3473 postcode/address image has used to evaluate the performance of the complete postcode recognition process. The basic character recognition rate for the full unconstrained alphanumeric character set is 63.1%, compared with an expected maximum attainable 75–80%. The addition of the syntactic and contextual knowledge stages produces an overall postcode recognition rate which is equivalent to an alphanumeric character recognition rate of 86–90%. Separate verification experiments on a subset of 820 address images show that, with the first-order features chosen, an overall correct address feature code extraction rate of around 35% is achieved.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.