Comparison and analysis of two methods for improving the accuracy of OpenNMT in literary and vernacular Chinese translation

Yiwen Li

doi:10.54254/2755-2721/53/20241428

Abstract

As machine translation advances, challenges persist in achieving high accuracy when translating between Literary and Vernacular Chinese. Literary Chinese is known for its concise writing style, often characterized by monosyllabic words. In contrast, Vernacular Chinese incorporates more polysyllabic words. This study utilizes the OpenNMT system to address these challenges and employs different Classical Chinese word segmentation tools to train 2-layer Long Short-Term Memory and Transformer models. These models are then compared and analyzed to measure the improvement in precision. The research findings reveal that adopting a character-level-based word segmentation method for Classical Chinese, coupled with training the Transformer model using OpenNMT, significantly enhances precision. This outcome validates the current observation that existing Classical Chinese word segmentation methods lack sufficient accuracy, consequently impacting the quality of translations between Literary and Vernacular Chinese. By exploring and investigating these approaches, this study contributes to advancing machine translation techniques for improving accuracy in rendering Literary and Vernacular Chinese translations.

Full Text