Abstract

Many methods of printed character recognition have been proposed to-date, but although performance figures are usually stated for a particular set of fonts or size of text, it is rarely clear under what conditions of noise the measurements were taken. Baird has suggested a model of Document Imaging Defects, which enables authors to compare results against an emerging standard where one figure can be quoted to quantify the level of noise present in the document image. In this paper a novel method is proposed for the recognition of printed characters, and its extension to the segmentation and recognition of noisy printed words is outlined. The method is based on the representation of the shape of a character by two Hidden Markov Models. Recognition is achieved by scoring these models against the test pattern and combining the results. The method has been evaluated using Baird's noise model, producing a peak performance of 99.5% on the test set in the presence of near-minimal noise. The method generalizes to recognize characters with noise levels greater than those included in the training set, and an investigation of the top- k performance suggests that much of the effect of noise on the recognition performance on images of natural language text could be overcome using a word recognizer employing shallow contextual knowledge.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call