Abstract
Representing distributions over permutations can be a daunting task due to the fact that the number of permutations of n objects scales factorially in n. One recent way that has been used to reduce storage complexity has been to exploit probabilistic independence, but as we argue, full independence assumptions impose strong sparsity constraints on distributions and are unsuitable for modeling rankings. We identify a novel class of independence structures, called riffled independence, encompassing a more expressive family of distributions while retaining many of the properties necessary for performing efficient inference and reducing sample complexity. In riffled independence, one draws two permutations independently, then performs the riffle shuffle, common in card games, to combine the two permutations to form a single permutation. Within the context of ranking, riffled independence corresponds to ranking disjoint sets of objects independently, then interleaving those rankings. In this paper, we provide a formal introduction to riffled independence and propose an automated method for discovering sets of items which are riffle independent from a training set of rankings. We show that our clustering-like algorithms can be used to discover meaningful latent coalitions from real preference ranking datasets and to learn the structure of hierarchically decomposable models based on riffled independence.
Highlights
Ranked data appears ubiquitously in various statistics and machine learning application domains
We present a novel, relaxed notion of independence, called riffled independence, in which one ranks disjoint subsets of items independently, interleaves the subset rankings to form a joint ranking of the item set
Riffled independence appears naturally in many ranked datasets — as we show, political coalitions in elections often lead to pronounced riffled independence constraints in the vote histograms
Summary
Ranked data appears ubiquitously in various statistics and machine learning application domains. For example, in reasoning about preference lists in surveys [21], search results in information retrieval applications [34], and ballots in certain elections [7] and even the ordering of topics and paragraphs within a document [4]. As with many challenging learning problems, one must contend with an intractably large state space when dealing with rankings since there are n! In building a statistical model over rankings, simple (yet flexible) models are preferable because they are typically more computationally tractable and less prone to overfitting. A popular and highly successful approach for achieving such simplicity for distributions involving large collections of interdependent variables has been to exploit conditional independence structures (e.g., naive Bayes, graphical models). Independence-based relations are harder to exploit due to the mutual exclusivity constraints which constrain any two items to map to different ranks in a given ranking
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.