Reasoning over Public and Private Data in Retrieval-Based Systems

Simran Arora,Jacob Kahn,Patrick Lewis,Angela Fan,Christopher Ré

doi:10.1162/tacl_a_00580

Simran Arora, Jacob Kahn + Show 3 more

https://doi.org/10.1162/tacl_a_00580

Copy DOI

Abstract

Abstract Users an organizations are generating ever-increasing amounts of private data from a wide range of sources. Incorporating private context is important to personalize open-domain tasks such as question-answering, fact-checking, and personal assistants. State-of-the-art systems for these tasks explicitly retrieve information that is relevant to an input question from a background corpus before producing an answer. While today’s retrieval systems assume relevant corpora are fully (e.g., publicly) accessible, users are often unable or unwilling to expose their private data to entities hosting public data. We define the Split Iterative Retrieval (SPIRAL) problem involving iterative retrieval over multiple privacy scopes. We introduce a foundational benchmark with which to study SPIRAL, as no existing benchmark includes data from a private distribution. Our dataset, ConcurrentQA, includes data from distinct public and private distributions and is the first textual QA benchmark requiring concurrent retrieval over multiple distributions. Finally, we show that existing retrieval approaches face significant performance degradations when applied to our proposed retrieval setting and investigate approaches with which these tradeoffs can be mitigated. We release the new benchmark and code to reproduce the results.1

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Transactions of the Association for Computational Linguistics	Publication Date: Aug 7, 2023
Citations: 2	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

Reasoning over Public and Private Data in Retrieval-Based Systems

Abstract

Talk to us

Similar Papers

More From: Transactions of the Association for Computational Linguistics

Lead the way for us

Similar Papers

GhostDB
Nicolas Anciaux ... Philippe Pucheral
-
Nicolas Anciaux, et. al.Nicolas Anciaux ... Philippe Pucheral
11 Jun 2007
11 Jun 2007

Differentially private distributed logistic regression using private and public data
Zhanglong Ji ... Shuang Wang
BMC Medical Genomics | VOL. 7
Zhanglong Ji, et. al.Zhanglong Ji ... Shuang Wang
01 Jan 2014
BMC Medical Genomics | VOL. 7

Robustness of Maximal α-Leakage to Side Information
Jiachun Liao ... Oliver Kosut
-
Jiachun Liao, et. al.Jiachun Liao ... Oliver Kosut
01 Jul 2019
01 Jul 2019

Revelation on demand
Nicolas Anciaux ... Dennis Shasha
Distributed and Parallel Databases | VOL. 25
Nicolas Anciaux, et. al.Nicolas Anciaux ... Dennis Shasha
27 Feb 2009
Distributed and Parallel Databases | VOL. 25

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Reasoning over Public and Private Data in Retrieval-Based Systems

Abstract

Talk to us

Similar Papers

More From: Transactions of the Association for Computational Linguistics