Improved Machine Reading Comprehension Using Data Validation for Weakly Labeled Data

Yunyeong Yang,Sangwoo Kang,Jungyun Seo

doi:10.1109/access.2019.2963569

Yunyeong Yang, Sangwoo Kang + Show 1 more

Open Access

https://doi.org/10.1109/access.2019.2963569

Copy DOI

Journal: IEEE Access	Publication Date: Jan 1, 2020
Citations: 6	License type: CC BY 4.0

Affiliation: Sogang University, Gachon University

Abstract

Machine reading comprehension (MRC) is a natural language processing task wherein a given question is answered according to a holistic understanding of a given context. Recently, many researchers have shown interest in MRC, for which a considerable number of datasets are being released. Datasets for MRC, which are composed of the context-query-answer triple, are designed to answer a given query by referencing and understanding a readily-available, relevant context text. The TriviaQA dataset is a weakly labeled dataset, because it contains irrelevant context that forms no basis for answering the query. The existing syntactic data cleaning method struggles to deal with the contextual noise this irrelevancy creates. Therefore, a semantic data cleaning method using reasoning processes is necessary. To address this, we propose a new MRC model in which the TriviaQA dataset is validated and trained using a high-quality dataset. The data validation method in our MRC model improves the quality of the training dataset, and the answer extraction model learns with the validated training data, because of our validation method. Our proposed method showed a 4.33% improvement in performance for the TriviaQA Wiki, compared to the existing baseline model. Accordingly, our proposed method can address the limitation of irrelevant context in MRC better than the human supervision.

Highlights

In the past few years, artificial intelligence has seen significant growth in many fields as a result of developments in deep learning [1]–[5]
Several approaches [14]–[19] that address the use of large scale datasets for machine reading comprehension (MRC) have been proposed; the datasets used in such studies include: Stanford Question Answering Dataset (SQuAD) [20], WikiQA [21], NewsQA [22], and TriviaQA [23]
OVERALL ARCHITECTURE We propose a new MRC model that uses a data validation method to improve the quality of weakly labeled data used to learn the answer extraction model

Summary

INTRODUCTION

In the past few years, artificial intelligence has seen significant growth in many fields as a result of developments in deep learning [1]–[5]. To predict the relevance of the query to the paragraph, BERT learns using the sentence pair classification task from the evidence extraction. A fine-tuned BERT model using Pseudo data is used by the evidence extraction to identify the relationship between the query and paragraph of TriviaQA. The BERT model used for noisy data validation learns to determine whether the sentence contains sufficient grounds to answer the query. For this purpose, BERT is learned to perform a sentence pair classification task like evidence extraction. The pre-trained BERT model parameters were fine-tuned using Wang data to perform noisy data validation; this selects the sentence required to answer the query.

EXPERIMENTS

Findings

CONCLUSION

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Improved Machine Reading Comprehension Using Data Validation for Weakly Labeled Data

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: IEEE Access

Lead the way for us

Similar Papers

Efficient Machine Reading Comprehension for Health Care Applications: Algorithm Development and Validation of a Context Extraction Approach.
Duy-Anh Nguyen ... Ryszard Kowalczyk
JMIR Formative Research | VOL. 8
Duy-Anh Nguyen, et. al.Duy-Anh Nguyen ... Ryszard Kowalczyk
25 Mar 2024
JMIR Formative Research | VOL. 8

An Iterative Multi-Source Mutual Knowledge Transfer Framework for Machine Reading Comprehension
Xin Liu ... Yubin Ge
-
Xin Liu, et. al.Xin Liu ... Yubin Ge
01 Jul 2020
01 Jul 2020

Explicit Utilization of General Knowledge in Machine Reading Comprehension
Chao Wang ... Hui Jiang
-
Chao Wang, et. al.Chao Wang ... Hui Jiang
01 Jan 2019
01 Jan 2019

A Survey on Machine Reading Comprehension—Tasks, Evaluation Metrics and Benchmark Datasets
Changchang Zeng ... Jianjun Hu
Applied Sciences | VOL. 10
Changchang Zeng, et. al.Changchang Zeng ... Jianjun Hu
29 Oct 2020
Applied Sciences | VOL. 10

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Improved Machine Reading Comprehension Using Data Validation for Weakly Labeled Data

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: IEEE Access