Abstract

We examine the use of character image analysis coupled with contextual information in complex data gathering forms to identify and correct optical character recognition (OCR) system rejection and substitution errors. Segmented characters from a complex data gathering form are initially classified using an OCR engine based on a combination of Karhunen-Loeve transforms and a back-propagation neural network. Systems of equations are derived from the data gathering form to determine the values of characters rejected by the OCR engine and to verify the consistency of the data captured. If the OCR results for a single form are determined to be inconsistent with respect to the form's data relationships, a set of decision algorithms which incorporates a second neural network and uses additional character features is used to tag characters according to their likelihood of substitution error. Potential substitution errors are incrementally added to the set of OCR reject errors and are processed through dynamically selected systems of equations and search techniques which correct both error classes. We provide experimental results and determine the extent to which errors can be detected and corrected for various OCR error rates.© (1993) COPYRIGHT SPIE--The International Society for Optical Engineering. Downloading of the abstract is permitted for personal use only.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call