AMPD: an Analects-Mandarin parallel dataset for bidirectional translation

Sihan Wang,Chunzhi Xie,Hao Li,Yajun Du,Zhisheng Gao

doi:10.1080/17445760.2024.2350683

AMPD: an Analects-Mandarin parallel dataset for bidirectional translation

Sihan Wang, Chunzhi Xie + Show 3 more

https://doi.org/10.1080/17445760.2024.2350683

Copy DOI

Journal: International Journal of Parallel, Emergent and Distributed Systems

Publication Date: May 18, 2024

Affiliation: Xihua University

#Modern Mandarin #Field Of Natural Language Processing + Show 8 more

Abstract
Full-Text PDF
Similar Papers

Abstract

ABSTRACT In the field of natural language processing, there is no specialized dataset for the Analects, which makes it difficult to assess whether language models can find the semantic relevance between the Analects and modern Mandarin. To address this issue, this paper proposes a dataset named AMPD (Analects-Mandarin Parallel Dataset), which includes the Analects and its corresponding modern Mandarin, keywords and their annotations in the Analects, as well as sentiment. Additionally, we propose four baseline tasks and benchmark them by implementing currently popular algorithms respectively.

Full Text