Spectral ranking regression

Ilkay Yildiz

doi:10.17760/d20409482

Abstract

We consider learning from ranking labels generated as follows: given a query set of samples in a dataset, a labeler ranks the samples w.r.t. her preference. Such ranking labels scale exponentially with query set size; most importantly, in practice, they often exhibit lower variance compared to class labels.We propose a new neural network architecture based on siamese networks to incorporate both class and comparison labels, i.e., rankings of sample pairs, in the same training pipeline using Bradley-Terry and Thurstone loss functions. Our architecture leads to a significant improvement in predicting both class and comparison labels, increasing classification AUC by as much as 35% and comparison AUC by as much as 6% on several real-life datasets. We further show that, by incorporating comparisons, training from few samples becomes possible: a deep neural network of 5.9 million parameters trained on 80 images attains a 0.92 AUC when incorporating comparisons. Furthermore, we tackle the problem of accelerating learning over the exponential number of rankings. We consider a ranking regression problem in which we learn Plackett-Luce scores as functions of sample features. We solve the maximum likelihood estimation problem by using the Alternating Direction Method of Multipliers (ADMM), effectively separating the learning of scores and model parameters. This separation allows us to express scores as the stationary distribution of a continuous-time Markov Chain. Using this equivalence, we propose two spectral algorithms for ranking regression that learn shallow regression model parameters up to 579 times faster than the Newton's method. Finally, we bridge the gap between deep neural networks (DNNs) and efficient spectral algorithms that regress rankings under the Plackett-Luce model. We again solve the ranking regression problem using ADMM, and thus, express scores as the stationary distribution of a Markov chain. Moreover, we replace the standard l2-norm proximal penalty of ADMM with Kullback-Leibler (KL) divergence. This is a more suitable distance metric for Plackett-Luce scores, which form a probability distribution, and significantly improves prediction performance. Our resulting spectral algorithm is up to 175 times faster than siamese networks over four real-life datasets comprising ranking observations. At the same time, it consistently attains equivalent or better prediction performance than siamese networks, by up to 26% higher Top-1 Accuracy and 6% higher Kendall-Tau correlation.--Author's abstract

Full Text