Abstract

Representing distributions over permutations can be a daunting task due to the fact that the number of permutations of n objects scales factorially in n. One recent way that has been used to reduce storage complexity has been to exploit probabilistic independence, but as we argue, full independence assumptions impose strong sparsity constraints on distributions and are unsuitable for modeling rankings. We identify a novel class of independence structures, called riffled independence, encompassing a more expressive family of distributions while retaining many of the properties necessary for performing efficient inference and reducing sample complexity. In riffled independence, one draws two permutations independently, then performs the riffle shuffle, common in card games, to combine the two permutations to form a single permutation. Within the context of ranking, riffled independence corresponds to ranking disjoint sets of objects independently, then interleaving those rankings. In this paper, we provide a formal introduction to riffled independence and propose an automated method for discovering sets of items which are riffle independent from a training set of rankings. We show that our clustering-like algorithms can be used to discover meaningful latent coalitions from real preference ranking datasets and to learn the structure of hierarchically decomposable models based on riffled independence.

Highlights

  • Ranked data appears ubiquitously in various statistics and machine learning application domains

  • We present a novel, relaxed notion of independence, called riffled independence, in which one ranks disjoint subsets of items independently, interleaves the subset rankings to form a joint ranking of the item set

  • Riffled independence appears naturally in many ranked datasets — as we show, political coalitions in elections often lead to pronounced riffled independence constraints in the vote histograms

Read more

Summary

Introduction

Ranked data appears ubiquitously in various statistics and machine learning application domains. For example, in reasoning about preference lists in surveys [21], search results in information retrieval applications [34], and ballots in certain elections [7] and even the ordering of topics and paragraphs within a document [4]. As with many challenging learning problems, one must contend with an intractably large state space when dealing with rankings since there are n! In building a statistical model over rankings, simple (yet flexible) models are preferable because they are typically more computationally tractable and less prone to overfitting. A popular and highly successful approach for achieving such simplicity for distributions involving large collections of interdependent variables has been to exploit conditional independence structures (e.g., naive Bayes, graphical models). Independence-based relations are harder to exploit due to the mutual exclusivity constraints which constrain any two items to map to different ranks in a given ranking

Methods
Findings
Discussion
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.