Self-supervised Fine-tuning for Efficient Passage Re-ranking

Meoungjun Kim,Youngjoong Ko

doi:10.1145/3459637.3482179

Meoungjun Kim, Youngjoong Ko

Open Access

PDF Available

https://doi.org/10.1145/3459637.3482179

Copy DOI

Export

Save

Cite

Publication Date: Oct 26, 2021

Citations: 3

Affiliation: Sungkyunkwan University

Abstract
Full-Text PDF
Similar Papers

Abstract

Listen

Passage retrievers based on neural language models have recently achieved significant performance improvements in ranking tasks. Such ranking models have the advantage of finding the contextual features of queries and documents better than traditional keyword based methods. However, these deep learning-based models are limited by the large amounts of training data required. We propose a new fine-tuning method based on a masked language model (MLM) that is typically used in pre-trained language models. Our model improves the ranking performance using the MLM while efficiently utilizing less training data via data augmentation. The proposed approach applies self-supervised learning to information retrieval without needing additional expensive labeled data. In addition, because masking important terms during the fine-tuning stage can undermine ranking performance, the importance values of each term and sentence in a passage are calculated using the BM25 scheme and applied to the fine-tuning task such that the more important terms are masked less often. Our model is trained with dataset from MS MARCO re-ranking leaderboard and achieves the state-of-the-art MRR@10 performance in the leaderboard except for the ensemble-based method.

Full Text