Traditional Inverted Index Research Articles

Pseudo-relevance feedback mechanisms, from Rocchio to the relevance models, have shown the usefulness of expanding and reweighting the users’ initial queries using information occurring in an initial set of retrieved documents, known as the pseudo-relevant set. Recently, dense retrieval – through the use of neural contextual language models such as BERT for analysing the documents’ and queries’ contents and computing their relevance scores – has shown a promising performance on several information retrieval tasks still relying on the traditional inverted index for identifying documents relevant to a query. Two different dense retrieval families have emerged: the use of single embedded representations for each passage and query, e.g., using BERT’s [CLS] token, or via multiple representations, e.g., using an embedding for each token of the query and document (exemplified by ColBERT). In this work, we conduct the first study into the potential for multiple representation dense retrieval to be enhanced using pseudo-relevance feedback and present our proposed approach ColBERT-PRF. In particular, based on the pseudo-relevant set of documents identified using a first-pass dense retrieval, ColBERT-PRF extracts the representative feedback embeddings from the document embeddings of the pseudo-relevant set. Among the representative feedback embeddings, the embeddings that most highly discriminate among documents are employed as the expansion embeddings, which are then added to the original query representation. We show that these additional expansion embeddings both enhance the effectiveness of a reranking of the initial query results as well as an additional dense retrieval operation. Indeed, experiments on the MSMARCO passage ranking dataset show that MAP can be improved by up to 26% on the TREC 2019 query set and 10% on the TREC 2020 query set by the application of our proposed ColBERT-PRF method on a ColBERT dense retrieval approach.We further validate the effectiveness of our proposed pseudo-relevance feedback technique for a dense retrieval model on MSMARCO document ranking and TREC Robust04 document ranking tasks. For instance, ColBERT-PRF exhibits up to 21% and 14% improvement in MAP over the ColBERT E2E model on the MSMARCO document ranking TREC 2019 and TREC 2020 query sets, respectively. Additionally, we study the effectiveness of variants of the ColBERT-PRF model with different weighting methods. Finally, we show that ColBERT-PRF can be made more efficient, attaining up to 4.54× speedup over the default ColBERT-PRF model, and with little impact on effectiveness, through the application of approximate scoring and different clustering methods.

Read full abstract

In recent years, topic modeling is gaining significant momentum in information retrieval (IR). Researchers have found that utilizing the topic information generated through topic modeling together with traditional TF-IDF information generates superior results in document retrieval. However, in order to apply this idea to real-life IR systems, some critical problems need to be solved: how to store the topic information and how to utilize it with the TF-IDF information for efficient document retrieval. In this paper, we propose the Topic Enhanced Inverted Index (TEII) to incorporate the topic information into the inverted index for efficient top-k document retrieval. Specifically, we explore two different types of TEIIs. We first propose the incremental TEII, which includes the topic information into the traditional inverted index by adding topic-based inverted lists. The incremental TEII is beneficial for legacy IR systems, since it does not change the existing TF-IDF-based inverted lists. As a more flexible alternative, we propose the hybrid TEII to incorporate the topic information into each posting of the inverted index. In the hybrid TEII, two relaxation methods are proposed to support dynamic estimation of the upper bound impact of each posting. The hybrid TEII is highly extensible for incorporating different ranking factors and we show an extension of the hybrid TEII by considering the static quality of the documents in the corpus. Based on the incremental and hybrid TEIIs, we develop several query processing algorithms to support efficient top-k document retrieval on TEIIs. Empirical evaluation on the TREC dataset verifies the effectiveness and efficiency of the proposed index structures and query processing algorithms.

Read full abstract

Traditional Inverted Index Research Articles

Related Topics

Articles published on Traditional Inverted Index

ColBERT-PRF: Semantic Pseudo-Relevance Feedback for Dense Passage and Document Retrieval

A Keyword-Grouping Inverted Index Based Multi-Keyword Ranked Search Scheme Over Encrypted Cloud Data

Using Inverted Index for Fingerprint Search

A verifiable ranked ciphertext retrieval scheme based on bilinear mapping

Pavo: A RNN-Based Learned Inverted Index, Supervised or Unsupervised?

CCQA: A Chinese Community Question-answering Tool Based on Sentence Similarity with Word Embedding

TEII: Topic enhanced inverted index for top-k document retrieval

Ginix: Generalized inverted index for keyword search

A social inverted index for social-tagging-based information retrieval

A Design of the Inverted Index Based on Web Document Comprehending

An effective 3-in-1 keyword search method over heterogeneous data sources

An effective and versatile keyword search engine on heterogenous data sources

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Traditional Inverted Index Research Articles

Related Topics

Articles published on Traditional Inverted Index

ColBERT-PRF: Semantic Pseudo-Relevance Feedback for Dense Passage and Document Retrieval

A Keyword-Grouping Inverted Index Based Multi-Keyword Ranked Search Scheme Over Encrypted Cloud Data

Using Inverted Index for Fingerprint Search

A verifiable ranked ciphertext retrieval scheme based on bilinear mapping

Pavo: A RNN-Based Learned Inverted Index, Supervised or Unsupervised?

CCQA: A Chinese Community Question-answering Tool Based on Sentence Similarity with Word Embedding

TEII: Topic enhanced inverted index for top-k document retrieval

Ginix: Generalized inverted index for keyword search

A social inverted index for social-tagging-based information retrieval

A Design of the Inverted Index Based on Web Document Comprehending

An effective 3-in-1 keyword search method over heterogeneous data sources

An effective and versatile keyword search engine on heterogenous data sources