Learning to Rank from Noisy Data

Wenkui Ding,Xu-Dong Zhang,Xiubo Geng

doi:10.1145/2576230

Abstract

Learning to rank, which learns the ranking function from training data, has become an emerging research area in information retrieval and machine learning. Most existing work on learning to rank assumes that the training data is clean, which is not always true, however. The ambiguity of query intent, the lack of domain knowledge, and the vague definition of relevance levels all make it difficult for common annotators to give reliable relevance labels to some documents. As a result, the relevance labels in the training data of learning to rank usually contain noise. If we ignore this fact, the performance of learning-to-rank algorithms will be damaged.In this article, we propose considering the labeling noise in the process of learning to rank and using a two-step approach to extend existing algorithms to handle noisy training data. In the first step, we estimate the degree of labeling noise for a training document. To this end, we assume that the majority of the relevance labels in the training data are reliable and we use a graphical model to describe the generative process of a training query, the feature vectors of its associated documents, and the relevance labels of these documents. The parameters in the graphical model are learned by means of maximum likelihood estimation. Then the conditional probability of the relevance label given the feature vector of a document is computed. If the probability is large, we regard the degree of labeling noise for this document as small; otherwise, we regard the degree as large. In the second step, we extend existing learning-to-rank algorithms by incorporating the estimated degree of labeling noise into their loss functions. Specifically, we give larger weights to those training documents with smaller degrees of labeling noise and smaller weights to those with larger degrees of labeling noise. As examples, we demonstrate the extensions for McRank, RankSVM, RankBoost, and RankNet. Empirical results on benchmark datasets show that the proposed approach can effectively distinguish noisy documents from clean ones, and the extended learning-to-rank algorithms can achieve better performances than baselines.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Learning to Rank from Noisy Data

Abstract

Talk to us

Similar Papers

More From: ACM Transactions on Intelligent Systems and Technology

Lead the way for us

Journal: ACM Transactions on Intelligent Systems and Technology	Publication Date: Oct 7, 2015
Citations: 11

Similar Papers

A noise-tolerant graphical model for ranking
Xiubo Geng ... Xue-Qi Cheng
Information Processing and Management | VOL. 48
Xiubo Geng, et. al.Xiubo Geng ... Xue-Qi Cheng
27 Dec 2011
Information Processing and Management | VOL. 48

Impact of Noisy Labels on Dental Deep Learning-Calculus Detection on Bitewing Radiographs.
Martha Büttner ... Falk Schwendicke
Journal of clinical medicine | VOL. 12
Martha Büttner, et. al.Martha Büttner ... Falk Schwendicke
23 Apr 2023
Journal of clinical medicine | VOL. 12

Robust Loss Functions for Learning Multi-class Classifiers
Himanshu Kumar ... P S Sastry
-
Himanshu Kumar, et. al.Himanshu Kumar ... P S Sastry
01 Oct 2018
01 Oct 2018

Learning From Weakly Labeled Data Based on Manifold Regularized Sparse Model
Jia Zhang ... Shaozi Li
IEEE Transactions on Cybernetics | VOL. 52
Jia Zhang, et. al.Jia Zhang ... Shaozi Li
02 Sep 2020
IEEE Transactions on Cybernetics | VOL. 52

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Learning to Rank from Noisy Data

Abstract

Talk to us

Similar Papers

More From: ACM Transactions on Intelligent Systems and Technology