Retracted] Analyzing the Effect of Masking Length Distribution of MLM: An Evaluation Framework and Case Study on Chinese MRC Datasets

Changchang Zeng,Shaobo Li

doi:10.1155/2021/5375334

Abstract

Machine reading comprehension (MRC) is a challenging natural language processing (NLP) task. It has a wide application potential in the fields of question answering robots, human‐computer interactions in mobile virtual reality systems, etc. Recently, the emergence of pretrained models (PTMs) has brought this research field into a new era, in which the training objective plays a key role. The masked language model (MLM) is a self‐supervised training objective widely used in various PTMs. With the development of training objectives, many variants of MLM have been proposed, such as whole word masking, entity masking, phrase masking, and span masking. In different MLMs, the length of the masked tokens is different. Similarly, in different machine reading comprehension tasks, the length of the answer is also different, and the answer is often a word, phrase, or sentence. Thus, in MRC tasks with different answer lengths, whether the length of MLM is related to performance is a question worth studying. If this hypothesis is true, it can guide us on how to pretrain the MLM with a relatively suitable mask length distribution for MRC tasks. In this paper, we try to uncover how much of MLM’s success in the machine reading comprehension tasks comes from the correlation between masking length distribution and answer length in the MRC dataset. In order to address this issue, herein, (1) we propose four MRC tasks with different answer length distributions, namely, the short span extraction task, long span extraction task, short multiple‐choice cloze task, and long multiple‐choice cloze task; (2) four Chinese MRC datasets are created for these tasks; (3) we also have pretrained four masked language models according to the answer length distributions of these datasets; and (4) ablation experiments are conducted on the datasets to verify our hypothesis. The experimental results demonstrate that our hypothesis is true. On four different machine reading comprehension datasets, the performance of the model with correlation length distribution surpasses the model without correlation.

Highlights

In the field of natural language processing (NLP), machine reading comprehension (MRC) is a challenging task and has received extensive attention
On four different machine reading comprehension datasets, the performance of the model with correlation length distribution surpasses the model without correlation
In order to quantitatively verify whether masking schemes with different lengths will affect the performance of masked language model (MLM), we propose two span extraction tasks with different answer lengths for Chinese machine reading comprehension

Summary

Introduction

In the field of natural language processing (NLP), machine reading comprehension (MRC) is a challenging task and has received extensive attention. Most of the early reading comprehension systems were based on retrieval technology; that is, we search in the article according to the questions and find the relevant sentences as the Wireless Communications and Mobile Computing answers. With the development of machine learning (especially deep learning) and the release of large-scale datasets, the efficiency and quality of the MRC model have been greatly improved. BERT uses unsupervised learning to pretrain on a large-scale corpus and creatively uses MLM and NSP subtasks to enhance the language ability of the model [5]. After the author released the code and pretrained models, BERT was immediately used by researchers in various NLP tasks, and the previous SOTA results were refreshed frequently and significantly

Methods

Results

Conclusion

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Wireless Communications and Mobile Computing	Publication Date: Jan 1, 2021
Citations: 11	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

Retracted] Analyzing the Effect of Masking Length Distribution of MLM: An Evaluation Framework and Case Study on Chinese MRC Datasets

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Wireless Communications and Mobile Computing

Lead the way for us

Similar Papers

A Survey on Machine Reading Comprehension—Tasks, Evaluation Metrics and Benchmark Datasets
Changchang Zeng ... Jianjun Hu
Applied Sciences | VOL. 10
Changchang Zeng, et. al.Changchang Zeng ... Jianjun Hu
29 Oct 2020
Applied Sciences | VOL. 10

ViMRC - VLSP 2021: Context-Aware Answer Extraction in Vietnamese Question Answering
Thi Thu Hang Le
VNU Journal of Science: Computer Science and Communication Engineering | VOL. 38
Thi Thu Hang LeThi Thu Hang Le
16 Dec 2022
ViMRC - VLSP 2021: Context-Aware Answer Extraction in Vietnamese Question Answering
Thi Thu Hang Le

Multi-Task Deep Neural Networks for Multi-Document Reading Comprehension
Chang Liu ... Wayne Lin
-
Chang Liu, et. al.Chang Liu ... Wayne Lin
18 Jul 2021
18 Jul 2021

Teaching Machines to Read and Comprehend Tibetan Text
Yuan Sun ... Xiaobing Zhao
Journal of Computer and Communications | VOL. 09
Yuan Sun, et. al.Yuan Sun ... Xiaobing Zhao
01 Jan 2020
Journal of Computer and Communications | VOL. 09

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Retracted] Analyzing the Effect of Masking Length Distribution of MLM: An Evaluation Framework and Case Study on Chinese MRC Datasets

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Wireless Communications and Mobile Computing