Abstract

Abstract Systems for Open-Domain Question Answering (OpenQA) generally depend on a retriever for finding candidate passages in a large corpus and a reader for extracting answers from those passages. In much recent work, the retriever is a learned component that uses coarse-grained vector representations of questions and passages. We argue that this modeling choice is insufficiently expressive for dealing with the complexity of natural language questions. To address this, we define ColBERT-QA, which adapts the scalable neural retrieval model ColBERT to OpenQA. ColBERT creates fine-grained interactions between questions and passages. We propose an efficient weak supervision strategy that iteratively uses ColBERT to create its own training data. This greatly improves OpenQA retrieval on Natural Questions, SQuAD, and TriviaQA, and the resulting system attains state-of-the-art extractive OpenQA performance on all three datasets.

Highlights

  • The goal of Open-Domain Question Answering (OpenQA; Voorhees and Tice, 2000) is to find answers to factoid questions in potentially massive unstructured text corpora

  • To assess ColBERT-QA, we report on experiments with Natural Questions (Kwiatkowski et al, 2019), SQuAD (Rajpurkar et al, 2016), and TriviaQA (Joshi et al, 2017)

  • We find that the answer is consistently positive: ColBERT-QA leads to state-of-the-art extractive OpenQA results

Read more

Summary

Introduction

The goal of Open-Domain Question Answering (OpenQA; Voorhees and Tice, 2000) is to find answers to factoid questions in potentially massive unstructured text corpora. ColBERT does so while scaling to millions of documents and maintaining low query latency We hypothesize that this form of interaction will permit our model to be sensitive to the nature of questions without compromising the OpenQA goal of scaling to massive datasets. RGS starts from an existing weak retrieval model (e.g., BM25) to collect the top-k passages for every training question and uses a provided weak heuristic to sort these passages into positive and negative examples, relying on the ordering imposed by the retriever These examples are used to train a more effective retriever, and this process is applied 2–3 times, with the resulting retriever deployed in the OpenQA pipeline. Our resulting ColBERT-QA system establishes state-of-the-art retrieval and downstream performance

Machine Reading Comprehension
OpenQA
Retrieval Models for OpenQA
Supervision Paradigms in OpenQA
ColBERT-QA
The ColBERT Model
Relevance-Guided Supervision
Reader Supervision for OpenQA
Evaluating ColBERT-QA’s Retrieval
Methods
Baselines
Evaluating ColBERT out of domain
Evaluating Relevance-Guided Supervision
Datasets
Our Models
Conclusion
A Datasets
B Implementation Details
C Retrieval Analysis
D RGS for Single-Vector Retrieval
Findings
E Computational Cost and Latency
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.