Abstract
For each scanned text image, we generate the morphological pattern spectrum which captures image object shape and size information. We use the spectrum to characterize the noise content of a text document image by considering only the region of the spectrum near the origin. Noise is known to affect many image processing operations and we chose to consider optical character recognition (OCR) in this experiment. We associate noise that is characterized by a partial pattern spectrum with OCR performance as measured by an error rate by using a linear distributed associative memory (DAM). The DAM is trained to recognize the spectra of three classes of images: with high, medium, and low OCR error rates. The DAM is not forced to make a classification every time. It is allowed to reject as unknown a spectrum presented that does not closely resemble any that has been stored in the DAM. The DAM was fairly accurate with noisy images but conservative (i.e., rejected several text images as unknowns) when there was little noise.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.