RESEARCH ON CHINESE OCR IN TAIWAN

Fang-Hsuan Cheng,Wen-Hsing Hsu

doi:10.1142/s0218001491000107

Abstract

This paper describes typical research on Chinese optical character recognition in Taiwan. Chinese characters can be represented by a set of basic line segments called strokes. Several approaches to the recognition of handwritten Chinese characters by stroke analysis are described here. A typical optical character recognition (OCR) system consists of four main parts: image preprocessing, feature extraction, radical extraction and matching. Image preprocessing is used to provide the suitable format for data processing. Feature extraction is used to extract stable features from the Chinese character. Radical extraction is used to decompose the Chinese character into radicals. Finally, matching is used to recognize the Chinese character. The reasons for using strokes as the features for Chinese character recognition are the following. First, all Chinese characters can be represented by a combination of strokes. Second, the algorithms developed under the concept of strokes do not have to be modified when the number of characters increases. Therefore, the algorithms described in this paper are suitable for recognizing large sets of Chinese characters.

Full Text