Uncovering the riffled independence structure of ranked data

Jonathan Huang,Carlos Guestrin

doi:10.1214/12-ejs670

Abstract

Representing distributions over permutations can be a daunting task due to the fact that the number of permutations of n objects scales factorially in n. One recent way that has been used to reduce storage complexity has been to exploit probabilistic independence, but as we argue, full independence assumptions impose strong sparsity constraints on distributions and are unsuitable for modeling rankings. We identify a novel class of independence structures, called riffled independence, encompassing a more expressive family of distributions while retaining many of the properties necessary for performing efficient inference and reducing sample complexity. In riffled independence, one draws two permutations independently, then performs the riffle shuffle, common in card games, to combine the two permutations to form a single permutation. Within the context of ranking, riffled independence corresponds to ranking disjoint sets of objects independently, then interleaving those rankings. In this paper, we provide a formal introduction to riffled independence and propose an automated method for discovering sets of items which are riffle independent from a training set of rankings. We show that our clustering-like algorithms can be used to discover meaningful latent coalitions from real preference ranking datasets and to learn the structure of hierarchically decomposable models based on riffled independence.

Highlights

Ranked data appears ubiquitously in various statistics and machine learning application domains
We present a novel, relaxed notion of independence, called riffled independence, in which one ranks disjoint subsets of items independently, interleaves the subset rankings to form a joint ranking of the item set
Riffled independence appears naturally in many ranked datasets — as we show, political coalitions in elections often lead to pronounced riffled independence constraints in the vote histograms

Summary

Introduction

Ranked data appears ubiquitously in various statistics and machine learning application domains. For example, in reasoning about preference lists in surveys [21], search results in information retrieval applications [34], and ballots in certain elections [7] and even the ordering of topics and paragraphs within a document [4]. As with many challenging learning problems, one must contend with an intractably large state space when dealing with rankings since there are n! In building a statistical model over rankings, simple (yet flexible) models are preferable because they are typically more computationally tractable and less prone to overfitting. A popular and highly successful approach for achieving such simplicity for distributions involving large collections of interdependent variables has been to exploit conditional independence structures (e.g., naive Bayes, graphical models). Independence-based relations are harder to exploit due to the mutual exclusivity constraints which constrain any two items to map to different ranks in a given ranking

Methods

Findings

Discussion

Conclusion

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Electronic Journal of Statistics	Publication Date: Jan 1, 2012
Citations: 40	License type: cc-by

R Discovery Prime

R Discovery Prime

Uncovering the riffled independence structure of ranked data

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Electronic Journal of Statistics

Lead the way for us

Similar Papers

Proposal of Method for Converting a Physical Card Game to Digital for Logical Reasoning Competencies on the Data Structure Subject
Lucy Mari Tabuti ... Ricardo Nakamura
-
Lucy Mari Tabuti, et. al.Lucy Mari Tabuti ... Ricardo Nakamura
21 Oct 2020
21 Oct 2020

Author response: Limitations of principal components in quantitative genetic association models for human studies
Yiqi Yao ... Alejandro Ochoa
-
Yiqi Yao, et. al.Yiqi Yao ... Alejandro Ochoa
25 Apr 2023
25 Apr 2023

Decision letter: Limitations of principal components in quantitative genetic association models for human studies
Magnus Nordborg ... Detlef Weigel
-
Magnus Nordborg, et. al.Magnus Nordborg ... Detlef Weigel
04 Jul 2022
04 Jul 2022

Editor's evaluation: Limitations of principal components in quantitative genetic association models for human studies
Magnus Nordborg
-
Magnus NordborgMagnus Nordborg
04 Jul 2022
04 Jul 2022

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Uncovering the riffled independence structure of ranked data

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Electronic Journal of Statistics