Abstract

Humans gather information through conversations involving a series of interconnected questions and answers. For machines to assist in information gathering, it is therefore essential to enable them to answer conversational questions. We introduce CoQA, a novel dataset for building Conversational Question Answering systems. Our dataset contains 127k questions with answers, obtained from 8k conversations about text passages from seven diverse domains. The questions are conversational, and the answers are free-form text with their corresponding evidence highlighted in the passage. We analyze CoQA in depth and show that conversational questions have challenging phenomena not present in existing reading comprehension datasets (e.g., coreference and pragmatic reasoning). We evaluate strong dialogue and reading comprehension models on CoQA. The best system obtains an F1 score of 65.4%, which is 23.4 points behind human performance (88.8%), indicating that there is ample room for improvement. We present CoQA as a challenge to the community at https://stanfordnlp.github.io/coqa .

Highlights

  • We ask other people a question to either seek or test their knowledge about a subject

  • Paper, we introduce CoQA,1 a Conversational Question Answering dataset for measuring the ability of machines to participate in a questionanswering style conversation

  • In CoQA, a machine has to understand a text passage and answer a series of questions that appear in a conversation

Read more

Summary

Introduction

We ask other people a question to either seek or test their knowledge about a subject. In CoQA, a machine has to understand a text passage and answer a series of questions that appear in a conversation. In this conversation, every question after the first is dependent on the conversation history. There are no largescale reading comprehension datasets that contain questions that depend on a conversation history (see Table 1) and this is what CoQA is mainly developed for.. Many existing QA datasets restrict answers to contiguous text spans in a given passage (Table 1). Humans achieve 88.8% F1, 23.4% F1 higher, indicating that there is a considerable room for improvement

Task Definition
Dataset Collection
Collection Interface
Passage Selection
Literature
Collecting Multiple Answers
Dataset Analysis
Linguistic Phenomena
Analysis of Free-form Answers
Conversation Flow
Models
Conversational Models
Reading Comprehension Models
A Combined Model
Evaluation Metric
Experimental Setup
Results and Discussion
Related work
Conclusions
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call