Machine Reading Comprehension Using Multi-Passage BERT with Dice Loss on Thai Corpus

Theerit Lapchaicharoenkit,Peerapon Vateekeul

doi:10.37936/ecticit.2022162.247799

Theerit Lapchaicharoenkit, Peerapon Vateekeul

Open Access

PDF Available

https://doi.org/10.37936/ecticit.2022162.247799

Copy DOI

Export

Save

Cite

Abstract
Full-Text PDF
Similar Papers

Abstract

Listen

Nowadays there is an advancement in the field of machine reading comprehension task (MRC) due to the invention of large scale pre-trained language models, such as BERT. However, the performance is still limited when the context is long and contains many passages. BERT can only embed a part of the whole passage equal to the input size; thus, sliding windows must be used which leads to discontinued information when the passage is long. In this paper, we aim to propose a BERT-based MRC framework tailored for a long passage context in the Thai corpus. Our framework employs the multi-passage BERT along with self-adjusting dice loss, which can help the model focuses more on the answer region of the context passage. We also show that there is an improvement in the performance when an auxiliary task is used. The experiment was conducted on the Thai Question Answering (QA) dataset used in Thailand National Software Competition. The results show that our method improves the model’s performance over a traditional BERT framework.

Full Text