Abstract

Information retrieval (IR) is a fundamental task in many real-world applications such as Web search, question answering systems, and digital libraries. The core of IR is to identify information resources relevant to user’s information need. Since there might be more than one relevant resource, the returned result is often organized as a ranked list of documents according to their relevance degree against the information need. The ranking property of IR makes it different from other tasks, and researchers have devoted substantial efforts to develop a variety of ranking models in IR. In recent years, the resurgence of deep learning has greatly advanced this field and led to a hot topic named NeuIR (neural information retrieval), especially the paradigm of pre-training methods (PTMs). Owing to sophisticated pre-training objectives and huge model size, pre-trained models can learn universal language representations from massive textual data that are beneficial to the ranking task of IR. Considering the rapid progress of this direction, this survey provides a systematic review of PTMs in IR. The authors present an overview of PTMs applied in different components of an IR system, including the retrieval component and the re-ranking component. In addition, they introduce PTMs specifically designed for IR, and summarize available datasets as well as benchmark leaderboards. Lastly, they discuss some open challenges and highlight several promising directions with the hope of inspiring and facilitating more works on these topics for future research.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.