ViMRC VLSP 2021: XLM-R versus PhoBERT on Vietnamese Machine Reading Comprehension

Nhat Nguyen Duy,Phong Nguyen-Thuan Do

doi:10.25073/2588-1086/vnucsce.334

Abstract

The development of industry 4.0 in the world is creating challenges in Artificial Intelligence (AI) in general and Natural Language Processing (NLP) in particular. Machine Reading Comprehension (MRC) is an NLP task with real-world applications that require machines to determine the correct answers to questions based on a given document. MRC systems must not only answer questions when possible but also determine when no answer is supported by the document and abstain from answering. In this paper, we present the description of our system to solve this task at the VLSP shared task 2021: Vietnamese Machine Reading Comprehension with UIT-ViQuAD 2.0. We propose a model to solve that task, called MRC4MRC. The model is a combination of two MRC components. Our MRC4MRC based on the XLM-RoBERTa pre-trained language model is 79.13% of F1-score (F1) and 69.72% of EM (Exact Match) on the public-test set. Our experiments also show that the XLM-R language model is better than the powerful PhoBERT language model on UIT-ViQuAD 2.0.

Full Text