Span Selection Pre-training for Question Answering

Michael Glass,Rishav Chakravarti,G P Shrivatsa Bhargav,Lin Pan,Anthony Ferritto,Avi Sil,Dinesh Garg,Alfio Gliozzo

doi:10.18653/v1/2020.acl-main.247

Abstract

BERT (Bidirectional Encoder Representations from Transformers) and related pre-trained Transformers have provided large gains across many language understanding tasks, achieving a new state-of-the-art (SOTA). BERT is pretrained on two auxiliary tasks: Masked Language Model and Next Sentence Prediction. In this paper we introduce a new pre-training task inspired by reading comprehension to better align the pre-training from memorization to understanding. Span Selection PreTraining (SSPT) poses cloze-like training instances, but rather than draw the answer from the model’s parameters, it is selected from a relevant passage. We find significant and consistent improvements over both BERT-BASE and BERT-LARGE on multiple Machine Reading Comprehension (MRC) datasets. Specifically, our proposed model has strong empirical evidence as it obtains SOTA results on Natural Questions, a new benchmark MRC dataset, outperforming BERT-LARGE by 3 F1 points on short answer prediction. We also show significant impact in HotpotQA, improving answer prediction F1 by 4 points and supporting fact prediction F1 by 1 point and outperforming the previous best system. Moreover, we show that our pre-training approach is particularly effective when training data is limited, improving the learning curve by a large amount.

Highlights

State-of-the-art approaches for NLP tasks are based on language models that are pre-trained on tasks which do not require labeled data (Peters et al, 2018; Howard and Ruder, 2018; Devlin et al, 2018; Yang et al, 2019; Liu et al, 2019; Sun et al, 2019)
We provide an extensive evaluation of the span selection pre-training method across four reading comprehension tasks: the Stanford Question Answering Dataset (SQuAD) in both version 1.1 and 2.0; followed by the Google Natural Questions dataset (Kwiatkowski et al, 2019) and a multihop Question Answering dataset, HotpotQA (Yang et al, 2018)
The input to BERT is a concatenation of two segments x1, . . . , xM and y1, . . . , yN separated by special delimiter markers like so: [CLS], x1, . . . , xM, [SEP ], y1, . . . , yN, [SEP ] such that M + N < S where S is the maximum sequence length allowed during training1

Summary

Introduction

State-of-the-art approaches for NLP tasks are based on language models that are pre-trained on tasks which do not require labeled data (Peters et al, 2018; Howard and Ruder, 2018; Devlin et al, 2018; Yang et al, 2019; Liu et al, 2019; Sun et al, 2019). Pre-trained transformer models do encode a substantial number of specific facts in their parameter matrices, enabling them to answer questions directly from the model itself (Radford et al, 2019). In MRC tasks, the model does not need to generate an answer it has encoded in its parameters. To better align the pre-training with the needs of the MRC task, we use span selection as an additional auxiliary task This task is similar to the cloze task, but is designed to have a fewer simple instances requiring only syntactic or collocation understanding. For cloze instances that require specific knowledge, rather than training the model to encode this knowledge in its parameterization, we provide a relevant and answer-bearing passage paired with the cloze instance.

Related Work

Background

Architecture and setup

Objective functions

Span Selection

Extended Pre-training

True Label

MRC Tasks

Natural Questions

Method

Experiments

HotpotQA

Exploration of SSPT Instance Types

Comparison to Previous Work

Findings

Conclusion and Future Work

Full Text

Published version (

Free)

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Span Selection Pre-training for Question Answering

Abstract

Highlights

Summary

Talk to us

Similar Papers

Lead the way for us

Publication Date: Jan 1, 2020
Citations: 34	License type: cc-by

Similar Papers

A Literature Review on Bidirectional Encoder Representations from Transformers
S Shreyashree ... Anita Kanavalli
-
S Shreyashree, et. al.S Shreyashree ... Anita Kanavalli
01 Jan 2021
01 Jan 2021

Bidirectional encoders to state-of-the-art: a review of BERT and its transformative impact on natural language processing
Rajesh Gupta
Информатика. Экономика. Управление - Informatics. Economics. Management | VOL. 3
Rajesh GuptaRajesh Gupta
02 Mar 2024
Информатика. Экономика. Управление - Informatics. Economics. Management | VOL. 3

Using Prior Knowledge to Guide BERT’s Attention in Semantic Textual Matching Tasks
Tingyu Xia ... Yue Wang
-
Tingyu Xia, et. al.Tingyu Xia ... Yue Wang
19 Apr 2021
19 Apr 2021

Automatic text classification of actionable radiology reports of tinnitus patients using bidirectional encoder representations from transformer (BERT) and in-domain pre-training (IDPT)
Jia Li ... Lei Zhao
BMC medical informatics and decision making | VOL. 22
Jia Li, et. al.Jia Li ... Lei Zhao
30 Jul 2022
BMC medical informatics and decision making | VOL. 22

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Span Selection Pre-training for Question Answering

Abstract

Highlights

Summary

Talk to us

Similar Papers