YuQ: A Chinese-Uyghur Medical-Domain Neural Machine Translation Dataset Towards Knowledge-Driven

Zhe Li,Wushour Slamu,Jing Sun,Jiabao Sheng,Qing Yu

doi:10.1007/978-981-33-6162-1_4

Abstract

Recent advances of deep learning have been successful in delivering state-of-the-art performance in medical analysis, However, deep neural networks (DNNs) require a large amount of training data with a high-quality annotation which is not available or expensive in the field of the medical domain. The research of medical domain neural machine translation (NMT) is largely limited due to the lack of parallel sentences that consist of medical domain background knowledge annotations. To this end, we propose a Chinese-Uyghur NMT knowledge-driven dataset, YuQ, which refers to a ground medical domain knowledge graphs. Our corpus 65K parallel sentences from the medical domain 130K utterances. By introduce medical domain glossary knowledge to the training model, we can win the challenge of low translation accuracy in Chinese-Uyghur machine translation professional terms. We provide several benchmark models. Ablation study results show that the models can be enhanced by introducing domain knowledge.

Full Text