DREAM: A Challenge Data Set and Models for Dialogue-Based Reading Comprehension

Kai Sun,Yejin Choi,Jianshu Chen,Dong Yu,Claire Cardie,Dian Yu

doi:10.1162/tacl_a_00264

Abstract

We present DREAM, the first dialogue-based multiple-choice reading comprehension data set. Collected from English as a Foreign Language examinations designed by human experts to evaluate the comprehension level of Chinese learners of English, our data set contains 10,197 multiple-choice questions for 6,444 dialogues. In contrast to existing reading comprehension data sets, DREAM is the first to focus on in-depth multi-turn multi-party dialogue understanding. DREAM is likely to present significant challenges for existing reading comprehension systems: 84% of answers are non-extractive, 85% of questions require reasoning beyond a single sentence, and 34% of questions also involve commonsense knowledge. We apply several popular neural reading comprehension models that primarily exploit surface information within the text and find them to, at best, just barely outperform a rule-based approach. We next investigate the effects of incorporating dialogue structure and different kinds of general world knowledge into both rule-based and (neural and non-neural) machine learning-based reading comprehension models. Experimental results on the DREAM data set show the effectiveness of dialogue structure and general world knowledge. DREAM is available at https://dataset.org/dream/ .

Highlights

IntroductionChoice (Lai et al, 2017; Khashabi et al, 2018; Ostermann et al, 2018) and extractive (Hermann et al, 2015; Hill et al, 2016; Rajpurkar et al, 2016; Trischler et al, 2017) reading comprehension data sets (Section 2)
A significant amount of research has focused on the construction of large-scale multiple-choice (Lai et al, 2017; Khashabi et al, 2018; Ostermann et al, 2018) and extractive (Hermann et al, 2015; Hill et al, 2016; Rajpurkar et al, 2016; Trischler et al, 2017) reading comprehension data sets (Section 2)
Because the original finetuned transformer LM (FTLM) framework already leverages rich linguistic information from a large unlabeled corpus, which can be regarded as a type of tacit general world knowledge, we investigate whether additional dialogue structure can further improve this strong baseline

Summary

Introduction

Choice (Lai et al, 2017; Khashabi et al, 2018; Ostermann et al, 2018) and extractive (Hermann et al, 2015; Hill et al, 2016; Rajpurkar et al, 2016; Trischler et al, 2017) reading comprehension data sets (Section 2). Source documents in these data sets have generally been drawn from formal written texts such as news, fiction, and Wikipedia articles, which are commonly considered wellwritten, accurate, and neutral. Answering 34% of DREAM questions requires unspoken commonsense knowledge, for example, unspoken scene information This might be due to the nature of dialogues: For efficient oral communication, people rarely state obvious explicit world knowledge (Forbes and Choi, 2017) such as ‘‘Christmas Day is celebrated on December 25th.’’Understanding

Methods

Results

Conclusion

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Transactions of the Association for Computational Linguistics	Publication Date: Nov 1, 2019
Citations: 230	License type: cc-by

R Discovery Prime

R Discovery Prime

DREAM: A Challenge Data Set and Models for Dialogue-Based Reading Comprehension

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Transactions of the Association for Computational Linguistics

Lead the way for us

Similar Papers

MultiQA: An Empirical Investigation of Generalization and Transfer in Reading Comprehension
Alon Talmor ... Jonathan Berant
-
Alon Talmor, et. al.Alon Talmor ... Jonathan Berant
01 Jan 2019
01 Jan 2019

Mixed inference machine reading comprehension method based on symbolic logic
Duanduan Liu
Intelligent Systems with Applications | VOL. 21
Duanduan LiuDuanduan Liu
22 Nov 2023
Intelligent Systems with Applications | VOL. 21

DeMRC: Dynamically Enhanced Multi-hop Reading Comprehension Model for Low Data
Xiu Tang ... Junjie Chen
-
Xiu Tang, et. al.Xiu Tang ... Junjie Chen
01 Jan 2021
01 Jan 2021

Quoref: A Reading Comprehension Dataset with Questions Requiring Coreferential Reasoning
Pradeep Dasigi ... Matt Gardner
-
Pradeep Dasigi, et. al.Pradeep Dasigi ... Matt Gardner
01 Jan 2019
01 Jan 2019

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

DREAM: A Challenge Data Set and Models for Dialogue-Based Reading Comprehension

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Transactions of the Association for Computational Linguistics