Abstract

Query expansion aims to mitigate the mismatch between the language used in a query and in a document. However, query expansion methods can suffer from introducing non-relevant information when expanding the query. To bridge this gap, inspired by recent advances in applying contextualized models like BERT to the document retrieval task, this paper proposes a novel query expansion model that leverages the strength of the BERT model to select relevant document chunks for expansion. In evaluation on the standard TREC Robust04 and GOV2 test collections, the proposed BERT-QE model significantly outperforms BERT-Large models.

Highlights

  • In information retrieval, the language used in a query and in a document differs in terms of verbosity, formality, and even the format

  • In order to reduce this gap, different query expansion methods have been proposed and have enjoyed success in improving document rankings. Such methods commonly take a pseudo relevance feedback (PRF) approach in which the query is expanded using topranked documents and the expanded query is used to rank the search results (Rocchio, 1971; Lavrenko and Croft, 2001; Amati, 2003; Metzler and Croft, 2007). Due to their reliance on pseudo relevance information, such expansion methods suffer from any non-relevant information in the feedback documents, which could pollute the query after expansion

  • For the proposed BERT-QE, in phase two, kd = 10 top-ranked documents from the search results of phase one are used, from which kc = 10 chunks are selected for expansion, and chunk length m = 10 is used

Read more

Summary

Introduction

The language used in a query and in a document differs in terms of verbosity, formality, and even the format (e.g., the use of keywords in a query versus the use of natural language in an article from Wikipedia). In order to reduce this gap, different query expansion methods have been proposed and have enjoyed success in improving document rankings Such methods commonly take a pseudo relevance feedback (PRF) approach in which the query is expanded using topranked documents and the expanded query is used to rank the search results (Rocchio, 1971; Lavrenko and Croft, 2001; Amati, 2003; Metzler and Croft, 2007). In the context of neural approaches, the recent neural PRF architecture (Li et al, 2018) uses feedback documents directly for expansion All these methods, are under-equipped to accurately evaluate the relevance of information pieces used for expansion. This can be caused by the mixing of relevant and non-relevant information in the expansion, like the tokens in RM3 (Lavrenko and Croft, 2001) and the documents in NPRF (Li et al, 2018); or by the facts that the models used for selecting and re-weighting the expansion information are not powerful enough, as they are essentially scalars based on counting

Objectives
Methods
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.