Contrastive Analysis and Feature Selection for Korean Modal Expression in Chinese-Korean Machine Translation System

Jin-Ji Li,Dong-Il Kim,Ji-Eun Roh,Jong-Hyeok Lee

doi:10.1142/s0219427905001298

Abstract

To generate a proper Korean predicate, a natural modal expression is the most important factor for a machine translation (MT) system. Tense, aspect, mood, negation, and voice are the major constituents related to modal expression. The linguistic encoding of a modal expression is quite different between Chinese and Korean in terms of linguistic typology and genealogy. In this paper, a new applicable categorization of Korean modality system viz. tense, aspect, mood, negation, and voice, will be proposed through a contrastive analysis of Chinese and Korean from the viewpoint of a practical MT system. In order to precisely determine the modal expression, effective feature selection frameworks for Chinese are presented with a variety of machine learning methods. As a result, our proposed approach achieved an accuracy of 83.10%.

Full Text