Abstract

Machine reading comprehension (MRC) is a challenging natural language processing (NLP) task. It has a wide application potential in the fields of question answering robots, human-computer interactions in mobile virtual reality systems, etc. Recently, the emergence of pretrained models (PTMs) has brought this research field into a new era, in which the training objective plays a key role. The masked language model (MLM) is a self-supervised training objective widely used in various PTMs. With the development of training objectives, many variants of MLM have been proposed, such as whole word masking, entity masking, phrase masking, and span masking. In different MLMs, the length of the masked tokens is different. Similarly, in different machine reading comprehension tasks, the length of the answer is also different, and the answer is often a word, phrase, or sentence. Thus, in MRC tasks with different answer lengths, whether the length of MLM is related to performance is a question worth studying. If this hypothesis is true, it can guide us on how to pretrain the MLM with a relatively suitable mask length distribution for MRC tasks. In this paper, we try to uncover how much of MLM’s success in the machine reading comprehension tasks comes from the correlation between masking length distribution and answer length in the MRC dataset. In order to address this issue, herein, (1) we propose four MRC tasks with different answer length distributions, namely, the short span extraction task, long span extraction task, short multiple-choice cloze task, and long multiple-choice cloze task; (2) four Chinese MRC datasets are created for these tasks; (3) we also have pretrained four masked language models according to the answer length distributions of these datasets; and (4) ablation experiments are conducted on the datasets to verify our hypothesis. The experimental results demonstrate that our hypothesis is true. On four different machine reading comprehension datasets, the performance of the model with correlation length distribution surpasses the model without correlation.

Highlights

  • In the field of natural language processing (NLP), machine reading comprehension (MRC) is a challenging task and has received extensive attention

  • On four different machine reading comprehension datasets, the performance of the model with correlation length distribution surpasses the model without correlation

  • In order to quantitatively verify whether masking schemes with different lengths will affect the performance of masked language model (MLM), we propose two span extraction tasks with different answer lengths for Chinese machine reading comprehension

Read more

Summary

Introduction

In the field of natural language processing (NLP), machine reading comprehension (MRC) is a challenging task and has received extensive attention. Most of the early reading comprehension systems were based on retrieval technology; that is, we search in the article according to the questions and find the relevant sentences as the Wireless Communications and Mobile Computing answers. With the development of machine learning (especially deep learning) and the release of large-scale datasets, the efficiency and quality of the MRC model have been greatly improved. BERT uses unsupervised learning to pretrain on a large-scale corpus and creatively uses MLM and NSP subtasks to enhance the language ability of the model [5]. After the author released the code and pretrained models, BERT was immediately used by researchers in various NLP tasks, and the previous SOTA results were refreshed frequently and significantly

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call