Abstract

This paper proposes a general Chinese document recognition system with high recognition rate, including preprocessing, recognition kernel, and post-processing, especially for low quality images. In the preprocessing module, fast rotation transformation algorithm is proposed. Since characters are extracted for recognition engines, document images must be segmented into text blocks, text lines, and then character images. In the recognition module, two recognition engines are used to recognize the character images. The weights of these kernels and features are calculated from the relative stroke widths of character images. In the post-processing module, we calculate confidence values for different candidates and then select the most confident candidate as the OCR result. The experiments show the system we propose is very effective and efficient.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call