Abstract

Permutation-valued features arise in a variety of applications, either in a direct way when preferences are elicited over a collection of items, or an indirect way when numerical ratings are converted to a ranking. To date, there has been relatively limited study of regression, classification, and testing problems based on permutation-valued features, as opposed to permutation-valued responses. This paper studies the use of reproducing kernel Hilbert space methods for learning from permutation-valued features. These methods embed the rankings into an implicitly defined function space, and allow for efficient estimation of regression and test functions in this richer space. We characterize both the feature spaces and spectral properties associated with two kernels for rankings, the Kendall and Mallows kernels. Using tools from representation theory, we explain the limited expressive power of the Kendall kernel by characterizing its degenerate spectrum, and in sharp contrast, we prove that the Mallows kernel is universal and characteristic. We also introduce families of polynomial kernels that interpolate between the Kendall (degree one) and Mallows (infinite degree) kernels. We show the practical effectiveness of our methods via applications to Eurobarometer survey data as well as a Movielens ratings dataset.

Highlights

  • Ranked data arises naturally in any context in which preferences are expressed over a collection of alternatives

  • We provided feature map and Fourier-analytic characterizations for various right-invariant kernels: the Kendall and Mallows kernels, and a novel family of polynomial kernels

  • We showed that the Kendall kernel is nearly degenerate in two ways: its Gram matrix has rank d 2

Read more

Summary

Introduction

Ranked data arises naturally in any context in which preferences are expressed over a collection of alternatives. Familiar examples include election data, ratings of consumer items, or choice of schools. We consider datasets in which each covariate corresponds to a complete ranking over a set of d alternatives (that is, a permutation belonging to the symmetric group), and we study regression, classification and testing problems with such data. Each respondent was asked to indicate their preferences over sources of information about scientific developments; their options were: TV, radio, newspapers/magazines, scientific magazines, the internet, and school/university. Many natural questions arise from this dataset. Can we predict a person’s age/gender from their ranking? Do men and women (or old and young) have the same distribution over sources of information? The primary goal of this paper is to develop and analyze some principled methods for answering such questions

Objectives
Methods
Results
Discussion
Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.