Information Retrieval Problem Research Articles

Information-theoretic formulations of the private information retrieval (PIR) problem have been investigated under a variety of scenarios. Symmetric private information retrieval (SPIR) is a variant where a user is able to privately retrieve one out of K messages from N non-colluding replicated databases without learning anything about the remaining K-1 messages. However, the goal of perfect privacy can be too taxing for certain applications. In this paper, we investigate if the information-theoretic capacity of SPIR (equivalently, the inverse of the minimum download cost) can be increased by relaxing both user and DB privacy definitions. Such relaxation is relevant in applications where privacy can be traded for communication efficiency. We introduce and investigate the Asymmetric Leaky PIR (AL-PIR) model with different privacy leakage budgets in each direction. For user privacy leakage, we bound the probability ratios between all possible realizations of DB queries by a function of a non-negative constant ϵ. For DB privacy, we bound the mutual information between the undesired messages, the queries, and the answers, by a function of a non-negative constant δ. We propose a general AL-PIR scheme that achieves an upper bound on the optimal download cost for arbitrary ϵ and δ. We show that the optimal download cost of AL-PIR is upper-bounded as D * (ϵ,δ) ≤ 1+\frac 1N-1-\frac δe ϵ N K-1 -1. Second, we obtain an information-theoretic lower bound on the download cost as D * (ϵ,δ) ≥ 1+\frac 1Ne ϵ -1-\frac δ(Ne ϵ ) K-1 -1. The gap analysis between the two bounds shows that our AL-PIR scheme is optimal when ϵ = 0, i.e., under perfect user privacy and it is optimal within a maximum multiplicative gap of \frac N-e -ϵ N-1 for any ϵ > 0 and δ > 0.

Read full abstract

In this thesis we tackle the semantic gap, a long-standing problem in Information Retrieval (IR). The semantic gap can be described as the mismatch between users' queries and the way retrieval models answer to such queries. Two main lines of work have emerged over the years to bridge the semantic gap: (i) the use of external knowledge resources to enhance the bag-of-words representations used by lexical models, and (ii) the use of semantic models to perform matching between the latent representations of queries and documents. To deal with this issue, we first perform an in-depth evaluation of lexical and semantic models through different analyses [Marchesin et al., 2019]. The objective of this evaluation is to understand what features lexical and semantic models share, if their signals are complementary, and how they can be combined to effectively address the semantic gap. In particular, the evaluation focuses on (semantic) neural models and their critical aspects. Each analysis brings a different perspective in the understanding of semantic models and their relation with lexical models. The outcomes of this evaluation highlight the differences between lexical and semantic signals, and the need to combine them at the early stages of the IR pipeline to effectively address the semantic gap. Then, we build on the insights of this evaluation to develop lexical and semantic models addressing the semantic gap. Specifically, we develop unsupervised models that integrate knowledge from external resources, and we evaluate them for the medical domain - a domain with a high social value, where the semantic gap is prominent, and the large presence of authoritative knowledge resources allows us to explore effective ways to address it. For lexical models, we investigate how - and to what extent - concepts and relations stored within knowledge resources can be integrated in query representations to improve the effectiveness of lexical models. Thus, we propose and evaluate several knowledge-based query expansion and reduction techniques [Agosti et al., 2018, 2019; Di Nunzio et al., 2019]. These query reformulations are used to increase the probability of retrieving relevant documents by adding to or removing from the original query highly specific terms. The experimental analyses on different test collections for Precision Medicine - a particular use case of Clinical Decision Support (CDS) - show the effectiveness of the proposed query reformulations. In particular, a specific subset of query reformulations allow lexical models to achieve top performing results in all the considered collections. Regarding semantic models, we first analyze the limitations of the knowledge-enhanced neural models presented in the literature. Then, to overcome these limitations, we propose SAFIR [Agosti et al., 2020], an unsupervised knowledge-enhanced neural framework for IR. SAFIR integrates external knowledge in the learning process of neural IR models and it does not require labeled data for training. Thus, the representations learned within this framework are optimized for IR and encode linguistic features that are relevant to address the semantic gap. The evaluation on different test collections for CDS demonstrate the effectiveness of SAFIR when used to perform retrieval over the entire document collection or to retrieve documents for Pseudo Relevance Feedback (PRF) methods - that is, when it is used at the early stages of the IR pipeline. In particular, the quantitative and qualitative analyses highlight the ability of SAFIR to retrieve relevant documents affected by the semantic gap, as well as the effectiveness of combining lexical and semantic models at the early stages of the IR pipeline - where the complementary signals they provide can be used to obtain better answers to semantically hard queries.

Read full abstract

Information Retrieval Problem Research Articles

Related Topics

Articles published on Information Retrieval Problem

Software requirement specific entity extraction using transformer models

Learning to transfer knowledge from RDF Graphs with gated recurrent units

A design approach for a distributed system to handle Chinese

Intermittent Private Information Retrieval With Application to Location Privacy

A Fine-Tuned BERT-Based Transfer Learning Approach for Text Classification.

The Capacity of Symmetric Private Information Retrieval Under Arbitrary Collusion and Eavesdropping Patterns

Building Capacity-Achieving T-PIR Schemes for Some Parameters Over Binary Field via Subfield Sub-Codes

Music Emotion Recognition: Toward new, robust standards in personalized and context-sensitive applications

Detection and Extraction of Electronic Information During Investigative (Search) Actions Under Ukrainian Legislation

A geometric framework for pitch estimation on acoustic musical signals

The Capacity of Private Information Retrieval Under Arbitrary Collusion Patterns for Replicated Databases

Word-embedding-based query expansion: Incorporating Deep Averaging Networks in Arabic document retrieval

On the Fundamental Limits of Cache-Aided Multiuser Private Information Retrieval

Asymmetric Leaky Private Information Retrieval

Developing unsupervised knowledge-enhanced models to reduce the semantic gap in information retrieval

Private Information Retrieval Over Gaussian MAC

Improved reviewer assignment based on both word and semantic features

Using a problem-based approach to improve the professional readiness of students of physical and mathematical specialties

Low-Energy Acceleration of Binarized Convolutional Neural Networks Using a Spin Hall Effect Based Logic-in-Memory Architecture

Private Information Retrieval for a Multi-Message Scenario With Private Side Information

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Information Retrieval Problem Research Articles

Related Topics

Articles published on Information Retrieval Problem

Software requirement specific entity extraction using transformer models

Learning to transfer knowledge from RDF Graphs with gated recurrent units

A design approach for a distributed system to handle Chinese

Intermittent Private Information Retrieval With Application to Location Privacy

A Fine-Tuned BERT-Based Transfer Learning Approach for Text Classification.

The Capacity of Symmetric Private Information Retrieval Under Arbitrary Collusion and Eavesdropping Patterns

Building Capacity-Achieving T-PIR Schemes for Some Parameters Over Binary Field via Subfield Sub-Codes

Music Emotion Recognition: Toward new, robust standards in personalized and context-sensitive applications

Detection and Extraction of Electronic Information During Investigative (Search) Actions Under Ukrainian Legislation

A geometric framework for pitch estimation on acoustic musical signals

The Capacity of Private Information Retrieval Under Arbitrary Collusion Patterns for Replicated Databases

Word-embedding-based query expansion: Incorporating Deep Averaging Networks in Arabic document retrieval

On the Fundamental Limits of Cache-Aided Multiuser Private Information Retrieval

Asymmetric Leaky Private Information Retrieval

Developing unsupervised knowledge-enhanced models to reduce the semantic gap in information retrieval

Private Information Retrieval Over Gaussian MAC

Improved reviewer assignment based on both word and semantic features

Using a problem-based approach to improve the professional readiness of students of physical and mathematical specialties

Low-Energy Acceleration of Binarized Convolutional Neural Networks Using a Spin Hall Effect Based Logic-in-Memory Architecture

Private Information Retrieval for a Multi-Message Scenario With Private Side Information