Ancient–Modern Chinese Translation with a New Large Training Dataset

Dayiheng Liu,Jiancheng Lv,Qian Qu,Kexin Yang

doi:10.1145/3325887

Abstract

Ancient Chinese brings the wisdom and spirit culture of the Chinese nation. Automatic translation from ancient Chinese to modern Chinese helps to inherit and carry forward the quintessence of the ancients. However, the lack of large-scale parallel corpus limits the study of machine translation in ancient–modern Chinese. In this article, we propose an ancient–modern Chinese clause alignment approach based on the characteristics of these two languages. This method combines both lexical-based information and statistical-based information, which achieves 94.2 F1-score on our manual annotation Test set. We use this method to create a new large-scale ancient–modern Chinese parallel corpus that contains 1.24M bilingual pairs. To our best knowledge, this is the first large high-quality ancient–modern Chinese dataset. Furthermore, we analyzed and compared the performance of the SMT and various NMT models on this dataset and provided a strong baseline for this task.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Ancient–Modern Chinese Translation with a New Large Training Dataset

Abstract

Talk to us

Similar Papers

More From: ACM Transactions on Asian and Low-Resource Language Information Processing

Lead the way for us

Journal: ACM Transactions on Asian and Low-Resource Language Information Processing	Publication Date: May 31, 2019
Citations: 11

Similar Papers

English-Ukrainian Parallel Corpus: Prerequisites for Building and Practical Use in Translation Studies
Svitlana A Matvieieva ... Alla A Zernetska
Studies about Languages | VOL. 1
Svitlana A Matvieieva, et. al.Svitlana A Matvieieva ... Alla A Zernetska
13 Jul 2022
Studies about Languages | VOL. 1

Neural Machine Translation
Francisco Casacuberta Nolla ... Álvaro Peris Abril
Tradumàtica tecnologies de la traducció | VOL. -
Francisco Casacuberta Nolla, et. al.Francisco Casacuberta Nolla ... Álvaro Peris Abril
29 Dec 2017
Tradumàtica tecnologies de la traducció | VOL. -

Chinese Historical Term Translation Pairs Extraction Using Modern Chinese as a Pivot Language
Xiaoting Wu ... Lei Jing
-
Xiaoting Wu, et. al.Xiaoting Wu ... Lei Jing
01 Jan 2019
01 Jan 2019

A cross-temporal contrastive disentangled model for ancient Chinese understanding
Yuting Wei ... Bin Wu
Neural Networks | VOL. 179
Yuting Wei, et. al.Yuting Wei ... Bin Wu
01 Jul 2024
Neural Networks | VOL. 179

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Ancient–Modern Chinese Translation with a New Large Training Dataset

Abstract

Talk to us

Similar Papers

More From: ACM Transactions on Asian and Low-Resource Language Information Processing