Abstract

ABSTRACT In the field of natural language processing, there is no specialized dataset for the Analects, which makes it difficult to assess whether language models can find the semantic relevance between the Analects and modern Mandarin. To address this issue, this paper proposes a dataset named AMPD (Analects-Mandarin Parallel Dataset), which includes the Analects and its corresponding modern Mandarin, keywords and their annotations in the Analects, as well as sentiment. Additionally, we propose four baseline tasks and benchmark them by implementing currently popular algorithms respectively.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call