Abstract

When applying learning to rank algorithms in real search applications, noise in human labeled training data becomes an inevitable problem which will affect the performance of the algorithms. Previous work mainly focused on studying how noise affects ranking algorithms and how to design robust ranking algorithms. In our work, we investigate what inherent characteristics make training data robust to label noise. The motivation of our work comes from an interesting observation that a same ranking algorithm may show very different sensitivities to label noise over different data sets. We thus investigate the underlying reason for this observation based on two typical kinds of learning to rank algorithms (i.e.~pairwise and listwise methods) and three different public data sets (i.e.~OHSUMED, TD2003 and MSLR-WEB10K). We find that when label noise increases in training data, it is the \emph{document pair noise ratio} (i.e.~\emph{pNoise}) rather than \emph{document noise ratio} (i.e.~\emph{dNoise}) that can well explain the performance degradation of a ranking algorithm.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call