Abstract

Segmented machine-printed Chinese characters generally suffer from small distortions and small rotations due to noise and segmentation errors. These phenomena cause many conventional methods, especially those based on directional codes, to be unable to reach very high recognition rates, say above 99%. In this paper, regressional analysis is proposed as a means to overcome these problems. Firstly, thinning is applied to each segmented character, which is enclosed in a proper square box and also filtered for noise reduction beforehand. Secondly, the square thinned character image is divided into 9×9 meshes (blocks), instead of the conventional 8×8, for reasons of the Chinese character's characteristics and also for global feature extraction. Thirdly, line regression is applied, for all black points in each block, to obtain either the value of the slope angle, or a dispersion code which is derived from the sample correlation coefficient after proper transformation. Thus, each block is coded by one of three cases: 'blank', value of slope angle, or 'dispersion'. The peripheral blacks are used for preclassification. Proper scores for matching two characters are designed so that learning and recognition are quite efficient. The objective of designing this optical character recognition system is to get very small misrecognition rates and tolerable rejection rates. Experiments with three fonts, each consisting of 5401 characters, were carried out. The overall rejection rate is 1.25% and the overall misrecognition rate is 0.33%. These are acceptable for most users.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call