High-precision two-kernel Chinese character recognition in general document processing systems

San-Lung Zhao San-Lung Zhao,Hsi-Jian Lee Hsi-Jian Lee

doi:10.1109/icdar.2001.953863

Abstract

This paper proposes a general Chinese document recognition system with high recognition rate, including preprocessing, recognition kernel, and post-processing, especially for low quality images. In the preprocessing module, fast rotation transformation algorithm is proposed. Since characters are extracted for recognition engines, document images must be segmented into text blocks, text lines, and then character images. In the recognition module, two recognition engines are used to recognize the character images. The weights of these kernels and features are calculated from the relative stroke widths of character images. In the post-processing module, we calculate confidence values for different candidates and then select the most confident candidate as the OCR result. The experiments show the system we propose is very effective and efficient.

Full Text