MultiQA: An Empirical Investigation of Generalization and Transfer in Reading Comprehension

Alon Talmor,Jonathan Berant

doi:10.18653/v1/p19-1485

Abstract

A large number of reading comprehension (RC) datasets has been created recently, but little analysis has been done on whether they generalize to one another, and the extent to which existing datasets can be leveraged for improving performance on new ones. In this paper, we conduct such an investigation over ten RC datasets, training on one or more source RC datasets, and evaluating generalization, as well as transfer to a target RC dataset. We analyze the factors that contribute to generalization, and show that training on a source RC dataset and transferring to a target dataset substantially improves performance, even in the presence of powerful contextual representations from BERT (Devlin et al., 2019). We also find that training on multiple source RC datasets leads to robust generalization and transfer, and can reduce the cost of example collection for a new RC dataset. Following our analysis, we propose MultiQA, a BERT-based model, trained on multiple RC datasets, which leads to state-of-the-art performance on five RC datasets. We share our infrastructure for the benefit of the research community.

Highlights

Reading comprehension (RC) is concerned with reading a piece of text and answering questions about it (Richardson et al, 2013; Berant et al, 2014; Hermann et al, 2015; Rajpurkar et al, 2016)
We find the answer is a conclusive yes, as we obtain consistent improvements in our BERT-based RC model
We find that training on multiple source RC datasets is effective for both generalization and transfer

Summary

Introduction

Reading comprehension (RC) is concerned with reading a piece of text and answering questions about it (Richardson et al, 2013; Berant et al, 2014; Hermann et al, 2015; Rajpurkar et al, 2016). An interesting question is whether such pre-training improves performance even in the presence of powerful language representations from BERT. We find that when using the high capacity BERT-large, one can train a single model on multiple RC datasets, and obtain close to or better than state-of-the-art performance on all of them, without fine-tuning to a particular dataset. We will open source our infrastructure, which will help researchers evaluate models on a large number of datasets, and gain insight on the strengths and shortcoming of their methods We hope this will accelerate progress in language understanding. Pre-training on a RC dataset and fine-tuning on a target dataset substantially improves performance even in the presence of contextualized word representations (BERT). The code for the AllenNLP models is available at http://github.com/alontalmor/ multiqa

Datasets

Models

Do models generalize to unseen datasets?

Does pre-training improve results on small datasets?

Does context augmentation improve performance?

MULTIQA

Findings

Conclusions

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

MultiQA: An Empirical Investigation of Generalization and Transfer in Reading Comprehension

Abstract

Highlights

Summary

Talk to us

Similar Papers

Lead the way for us

Publication Date: Jan 1, 2019
Citations: 194	License type: cc-by

Similar Papers

Evaluation Metrics for Machine Reading Comprehension: Prerequisite Skills and Readability
Saku Sugawara ... Hikaru Yokono
-
Saku Sugawara, et. al.Saku Sugawara ... Hikaru Yokono
01 Jan 2017
01 Jan 2017

Improving ranking-based question answering with weak supervision for low-resource Qur’anic texts
Mohammed Elkoumy ... Amany Sarhan
Artificial Intelligence Review | VOL. 58
Mohammed Elkoumy, et. al.Mohammed Elkoumy ... Amany Sarhan
14 Nov 2024
Artificial Intelligence Review | VOL. 58

DREAM: A Challenge Data Set and Models for Dialogue-Based Reading Comprehension
Kai Sun ... Claire Cardie
Transactions of the Association for Computational Linguistics | VOL. 7
Kai Sun, et. al.Kai Sun ... Claire Cardie
01 Nov 2019
Transactions of the Association for Computational Linguistics | VOL. 7

BiPaR: A Bilingual Parallel Dataset for Multilingual and Cross-lingual Reading Comprehension on Novels
Yimin Jing ... Zhen Yan
-
Yimin Jing, et. al.Yimin Jing ... Zhen Yan
01 Jan 2019
01 Jan 2019

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

MultiQA: An Empirical Investigation of Generalization and Transfer in Reading Comprehension

Abstract

Highlights

Summary

Talk to us

Similar Papers