Abstract

ABSTRACT Learning-to-rank (LtR) algorithms are at the heart of modern day information retrieval systems. While a good number of LtR algorithms have been developed and scrutinized over the past decade, theoretical underpinnings of these algorithms are not thoroughly investigated so far. Amongst the theoretical aspects of a supervised learning algorithm, the bias-variance profiles are of utmost importance. In this article we aim to better understand the bias-variance profiles of rank-learning algorithms. Firstly, we formalize the bias and variance from a pointwise perspective where each query-document pair is treated as an individual training example. Secondly, we develop a framework to analyze the variability and systematic error of a rank-learner in terms of its ranking error, i.e., we analyze the bias-variance from a listwise perspective where a query and all of its associated documents are treated as a single training example. After developing the theoretical framework, we move on to test its applicability in practice. In particular, we choose a promising algorithm, namely random forest-based rank-learning algorithms for our investigation. We study the effect of varying an important parameter of the algorithm, namely the sub-sample size used to learn each tree, on bias and variance. Our hypothesis is that as the sub-sample size (per tree) increases, classical bias-variance tradeoff should be observed. On two large LtR benchmark datasets, experimental results show that our hypothesis holds true. We also explain the relative performance of two of the top performing LtR algorithms using their bias and variance profiles. To the best of our knowledge, this article presents the first thorough investigation into bias and variance analysis of rank-learning algorithms.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.