Abstract

One of the important reasons of the slow pace of the Mongolian question and answer research lies in the scarcity of question and answer corpus. In this paper, we constructed a dataset containing 50,000 pairs of Mongolian question and answer corpus through rule selection, Chinese-Mongolian translation and manual correction after collecting the existing Chinese question answering corpus. The automatic evaluation shows that the corpus has a good diversity of question and answer sentences, and the manual evaluation results show that 97% of the corpus conforms to the daily question and answer logic. The entries in the corpus are mainly from daily conversations in various field. The corpus can used in the end-to-end question and answer model. It is of great values in the practice of Mongolian automatic question and answer research.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call