&lt;title&gt;Identification and correction of rejection and substitution errors in optical character recognition systems&lt;/title&gt;

Marty M. Scholl,Glenn S. Himes,Frank A. DeCosta III

doi:10.1117/12.143616

Abstract

We examine the use of character image analysis coupled with contextual information in complex data gathering forms to identify and correct optical character recognition (OCR) system rejection and substitution errors. Segmented characters from a complex data gathering form are initially classified using an OCR engine based on a combination of Karhunen-Loeve transforms and a back-propagation neural network. Systems of equations are derived from the data gathering form to determine the values of characters rejected by the OCR engine and to verify the consistency of the data captured. If the OCR results for a single form are determined to be inconsistent with respect to the form's data relationships, a set of decision algorithms which incorporates a second neural network and uses additional character features is used to tag characters according to their likelihood of substitution error. Potential substitution errors are incrementally added to the set of OCR reject errors and are processed through dynamically selected systems of equations and search techniques which correct both error classes. We provide experimental results and determine the extent to which errors can be detected and corrected for various OCR error rates.© (1993) COPYRIGHT SPIE--The International Society for Optical Engineering. Downloading of the abstract is permitted for personal use only.

Full Text