Multi-Lingual Question Generation with Language Agnostic Language Model

Bingning Wang ,Ting Yao

doi:10.48448/a7dd-9771

Abstract

Question generation is the task of generating coherent and relevant questions given context paragraphs. Recently, with the development of large-scale question answering datasets such as SQuAD, the English question generation has been rapidly developed. However, for other languages such as Chinese, the available training data is limited, which hinders the development of question generation in the corresponding language. To investigate the multi-lingual question generation, in this paper, we develop a language-agnostic language model, which learns the shared representation from several languages in a single architecture. We propose an adversarial training objective to encourage the model to learn both language-specific and language-independent information. We utilize abundant monolingual text to improve the multi-lingual question generation via pre-training. With the language-agnostic language model, we achieve significant improvement in multi-lingual question generation over five languages. In addition, we propose a large-scale Chinese question generation dataset containing more than 220k human-generated questions to benefit the multi-lingual question generation research.

Full Text