Improving Subject-Area Question Answering with External Knowledge

Xiaoman Pan,Claire Cardie,Heng Ji,Dian Yu,Jianshu Chen,Dong Yu,Kai Sun

doi:10.18653/v1/d19-5804

Xiaoman Pan, Claire Cardie + Show 5 more

Open Access

PDF Available

https://doi.org/10.18653/v1/d19-5804

Copy DOI

Export

Save

Cite

Publication Date: Jan 1, 2019
Citations: 47	License type: cc-by

Abstract
Highlights/Summary
Full-Text PDF
Similar Papers

Abstract

Listen

We focus on multiple-choice question answering (QA) tasks in subject areas such as science, where we require both broad background knowledge and the facts from the given subject-area reference corpus. In this work, we explore simple yet effective methods for exploiting two sources of external knowledge for subject-area QA. The first enriches the original subject-area reference corpus with relevant text snippets extracted from an open-domain resource (i.e., Wikipedia) that cover potentially ambiguous concepts in the question and answer options. As in other QA research, the second method simply increases the amount of training data by appending additional in-domain subject-area instances. Experiments on three challenging multiple-choice science QA tasks (i.e., ARC-Easy, ARC-Challenge, and OpenBookQA) demonstrate the effectiveness of our methods: in comparison to the previous state-of-the-art, we obtain absolute gains in accuracy of up to 8.1%, 13.0%, and 12.8%, respectively. While we observe consistent gains when we introduce knowledge from Wikipedia, we find that employing additional QA training instances is not uniformly helpful: performance degrades when the added instances exhibit a higher level of difficulty than the original training data. As one of the first studies on exploiting unstructured external knowledge for subject-area QA, we hope our methods, observations, and discussion of the exposed limitations may shed light on further developments in the area.

Highlights

To answer questions relevant to a given text, human readers often rely on a certain amount of broad background knowledge obtained from sources outside of the text (McNamara et al, 2004; Salmeron et al, 2006)
We focus on multiple-choice question answering (QA) tasks in subject areas such as science, in which facts from the given reference corpus need to be combined with broadly applicable external knowledge to select the correct answer from the available options (Clark et al, 2016, 2018; Mihaylov et al, 2018)
Using only the extracted external corpus to perform information retrieval for reference document generation can achieve reasonable performance compared to using the original reference corpus, especially on the OpenBookQA dataset (62.2% vs. 64.8% under setting 1 and 63.0% vs. 65.0% under setting 2)

Summary

Introduction

To answer questions relevant to a given text (e.g., a document or a book), human readers often rely on a certain amount of broad background knowledge obtained from sources outside of the text (McNamara et al, 2004; Salmeron et al, 2006). It is perhaps not surprising that machine readers require knowledge external to the text itself to perform well on question answering (QA) tasks.

Methods

Results

Conclusion