Abstract

During paper manuscript scanning or photographing, which is the basis of automatic data entry for digitized documents by Optical Character Recognition (OCR) technique, the digital image blur is performed for scanning too thin or ink-bleed piece of paper and can usually cause OCR recognition errors. In this article, the combined algorithm of wavelet transform analysis and the median filter as well as the histogram adjustment is proposed for recovering Chinese character signal from the blurred mixtures. Scanning image of a piece of book page printed on thin paper-base is used to examine the algorithm, with the scanning image of the Chinese character mixed with the reverse side chart of the paper as blur signal. Simulation indicates that the combined algorithm can more effectively recover the blurring manuscript image to accuracy rate of 45% than the original image of accuracy rate only 1%. The combined algorithm proposed in this article can be directly integrated in OCR software to obtain higher accuracy for digital character recognition and automatic data entry.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.