Relevance-guided Supervision for OpenQA with ColBERT

Omar Khattab,Christopher Potts,Matei Zaharia

doi:10.1162/tacl_a_00405

Omar Khattab, Christopher Potts + Show 1 more

Open Access

https://doi.org/10.1162/tacl_a_00405

Copy DOI

Abstract

Abstract Systems for Open-Domain Question Answering (OpenQA) generally depend on a retriever for finding candidate passages in a large corpus and a reader for extracting answers from those passages. In much recent work, the retriever is a learned component that uses coarse-grained vector representations of questions and passages. We argue that this modeling choice is insufficiently expressive for dealing with the complexity of natural language questions. To address this, we define ColBERT-QA, which adapts the scalable neural retrieval model ColBERT to OpenQA. ColBERT creates fine-grained interactions between questions and passages. We propose an efficient weak supervision strategy that iteratively uses ColBERT to create its own training data. This greatly improves OpenQA retrieval on Natural Questions, SQuAD, and TriviaQA, and the resulting system attains state-of-the-art extractive OpenQA performance on all three datasets.

Highlights

The goal of Open-Domain Question Answering (OpenQA; Voorhees and Tice, 2000) is to find answers to factoid questions in potentially massive unstructured text corpora
To assess ColBERT-QA, we report on experiments with Natural Questions (Kwiatkowski et al, 2019), SQuAD (Rajpurkar et al, 2016), and TriviaQA (Joshi et al, 2017)
We find that the answer is consistently positive: ColBERT-QA leads to state-of-the-art extractive OpenQA results

Summary

Introduction

The goal of Open-Domain Question Answering (OpenQA; Voorhees and Tice, 2000) is to find answers to factoid questions in potentially massive unstructured text corpora. ColBERT does so while scaling to millions of documents and maintaining low query latency We hypothesize that this form of interaction will permit our model to be sensitive to the nature of questions without compromising the OpenQA goal of scaling to massive datasets. RGS starts from an existing weak retrieval model (e.g., BM25) to collect the top-k passages for every training question and uses a provided weak heuristic to sort these passages into positive and negative examples, relying on the ordering imposed by the retriever These examples are used to train a more effective retriever, and this process is applied 2–3 times, with the resulting retriever deployed in the OpenQA pipeline. Our resulting ColBERT-QA system establishes state-of-the-art retrieval and downstream performance

Machine Reading Comprehension

OpenQA

Retrieval Models for OpenQA

Supervision Paradigms in OpenQA

ColBERT-QA

The ColBERT Model

Relevance-Guided Supervision

Reader Supervision for OpenQA

Evaluating ColBERT-QA’s Retrieval

Methods

Baselines

Evaluating ColBERT out of domain

Evaluating Relevance-Guided Supervision

Datasets

Our Models

Conclusion

A Datasets

B Implementation Details

C Retrieval Analysis

D RGS for Single-Vector Retrieval

Findings

E Computational Cost and Latency

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Transactions of the Association for Computational Linguistics	Publication Date: Sep 8, 2021
Citations: 26	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

Relevance-guided Supervision for OpenQA with ColBERT

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Transactions of the Association for Computational Linguistics

Lead the way for us

Similar Papers

Study and Development of Question Answering System based on Ontology Query
Kaixi Wang ... Zhenbo Guo
-
Kaixi Wang, et. al.Kaixi Wang ... Zhenbo Guo
01 Jan 2015
01 Jan 2015

Bridging the Gap between Synthetic and Natural Questions via Sentence Decomposition for Semantic Parsing
Yilin Niu ... Minlie Huang
Transactions of the Association for Computational Linguistics | VOL. 11
Yilin Niu, et. al.Yilin Niu ... Minlie Huang
09 May 2023
Transactions of the Association for Computational Linguistics | VOL. 11

Relation Mapping For Question Answering Over Knowledge Graphs Using Large Corpus Of Free Text
Fathima Yusuff
-
Fathima YusuffFathima Yusuff
11 Jul 2022
11 Jul 2022

Efficient Passage Retrieval with Hashing for Open-domain Question Answering
...
-
, et. al. ...
01 Aug 2021
01 Aug 2021

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Relevance-guided Supervision for OpenQA with ColBERT

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Transactions of the Association for Computational Linguistics