An Optimization Framework for Remapping and Reweighting Noisy Relevance Labels

Yury Ustinovskiy,Valentina Fedorova,Pavel Serdyukov,Gleb Gusev

doi:10.1145/2911451.2911501

Abstract

Relevance labels is the essential part of any learning to rank framework. The rapid development of crowdsourcing platforms led to a significant reduction of the cost of manual labeling. This makes it possible to collect very large sets of labeled documents to train a ranking algorithm. However, relevance labels acquired via crowdsourcing are typically coarse and noisy, so certain consensus models are used to measure the quality of labels and to reduce the noise. This noise is likely to affect a ranker trained on such labels, and, since none of the existing consensus models directly optimizes ranking quality, one has to apply some heuristics to utilize the output of a consensus model in a ranking algorithm, e.g., to use majority voting among workers to get consensus labels. The major goal of this paper is to unify existing approaches to consensus modeling and noise reduction within a learning to rank framework. Namely, we present a machine learning algorithm aimed at improving the performance of a ranker trained on a crowdsourced dataset by proper remapping of labels and reweighting of samples. In the experimental part, we use several characteristics of workers/labels extracted via various consensus models in order to learn the remapping and reweighting functions. Our experiments on a large-scale dataset demonstrate that we can significantly improve state-of-the-art machine-learning algorithms by incorporating our framework.

Full Text