Abstract

Machine Reading Comprehension (MRC) has recently made significant progress. This paper is the result of our participation in building an MRC system specifically for Vietnamese on Vietnamese Machine Reading Comprehension at the 8th International Workshop on Vietnamese Language and Speech Processing (VLSP 2021). Based on SQuAD2.0, the organizing committee developed the Vietnamese Question Answering Dataset UIT-ViQuAD2.0, a reading comprehension dataset consisting of questions posed by crowd-workers on a set of Wikipedia Vietnamese articles. The UIT-ViQuAD2.0 dataset evolved from version 1.0 with the difference that version 2.0 contained answerable and unanswerable questions. The challenge of this problem is to distinguish between answerable and unanswerable questions. The answer to every question is a span of text, from the corresponding reading passage, or the question might be unanswerable. Our system employs simple yet highly effective methods. The system uses a pre-trained language model called XLM-RoBERTa (XLM-R), combined with filtering results from multiple output files to produce the final result. We created about 5-7 output files and select the answers with the most repetitions as the final prediction answer. After filtering, our system increased from 75.172% to 76.386% at the F1 measure and achieved 65,329% in the EM measure on the Private Test set.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.