Abstract

In traditional transductive learning, all queries are used in learning to rank in order to generate pseudo-labels when sufficient training data are not available. However, low quality queries may affect retrieval performance in transductive learning. We thus think that it is important to improve the quality of queries in transductive learning to train an effective ranking model. By using a small number of reliable samples and data close to the boundaries of classification, we propose building a query quality estimator by establishing a relationship between the benefits of good retrieval performance and features of the normalized query commitment that influence query quality. In our proposed transduction model, all queries available are filtered by the proposed query quality estimator and only high quality queries that enhance the effectiveness of retrieval such that they yield performance-related benefits, are used to generate pseudo-labels for learning to rank. Queries that can degrade performance benefits are discarded while creating the pseudo-labels. Pseudo-labels aggregated by high quality queries in transductive learning are then leveraged in learning to rank scenarios without sufficient training data. The results of extensive experiments on the standard LETOR 4.0 dataset showed that our proposed method can outperform strong baselines and the average normalized discounted cumulative gain is enhanced up to 7.77% in some case.

Highlights

  • Several techniques have been proposed in recent decades to construct ranking models for information retrieval, including traditional heuristic methods, probabilistic methods, and machine learning methods [1,2]

  • FRAMEWORK OF PROPOSED TRANSDUCTIVE LEARNING WITH LIMITED RELIABLE LABELS AND EXAMPLES WITH LOW CONFIDENCE we provide details of the proposed transductive learning method that learns a query quality estimator based on a few reliable examples, and examples with low confidence located close to the classification boundaries to select high-quality queries

  • We first evaluated the effectiveness of retrieval of our proposed method (PTL) in comparison with traditional transductive learning (TTL), which used all available queries to generate pseudo-labels for learning to rank

Read more

Summary

Introduction

Several techniques have been proposed in recent decades to construct ranking models for information retrieval, including traditional heuristic methods, probabilistic methods, and machine learning methods [1,2]. Transductive learning [5,6,7], a semi-supervised mode of learning, is often used to iteratively aggregate pseudo-labels for learning to rank in information retrieval in case a sufficient amount of training data are not available [8]. Only 13% of the five top-ranking documents in the initial retrieval results on MQ2008, a subject of LETOR 4.0 [11], were found to be relevant to the queries. In this case, noise was introduced to the pseudo-positive examples when transductive learning was applied, which significantly degraded retrieval performance. To guarantee the effectiveness of learning to rank in information retrieval, it is necessary to enhance the quality of the queries and their associated pseudo-labels during transductive learning

Objectives
Methods
Results
Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.