Abstract

The extraction of character image is an important front-end processing for optical character recognition (OCR) and other applications. This process is extremely important because the OCR applications usually extract salient features and process on them. The existence of noise not only destroys features of characters, but also introduces unwanted features. We propose a new algorithm which removes unwanted background noises from a textual image. Our algorithm is based on the observation that the magnitude of the intensity variation of character boundaries differs form that of noises at various scales of their wavelet transform. Therefore, most of the edges corresponding to the character boundaries at each scale can be extracted using a thresholding method. The internal region of characters is determined by a voting procedure, which uses the arguments of the remaining edges. The interior of recovered characters is solid containing no holes. Characters tend to become fattened, because of the smoothness being applied in the calculation of wavelet transform. To obtain a quality restoration of character image, the precise locations of characters at the original image are then estimated using a Bayesian criterion. Detailed algorithm with careful analysis of the free parameters are also conducted in this paper. The method is simple and effective. We also present some experimental results that suggest its effectiveness.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call