Permutation-Invariant Consensus over Crowdsourced Labels

Michael Giancola,Randy Paffenroth,Jacob Whitehill

doi:10.1609/hcomp.v6i1.13326

Abstract

This paper introduces a novel crowdsourcing consensus model and inference algorithm — which we call PICA (Permutation-Invariant Crowdsourcing Aggregation) — that is designed to recover the ground-truth labels of a dataset while being invariant to the class permutations enacted by the different annotators. This is particularly useful for settings in which annotators may have systematic confusions about the meanings of different classes, as well as clustering problems (e.g., dense pixel-wise image segmentation) in which the names/numbers assigned to each cluster have no inherent meaning.The PICA model is constructed by endowing each annotator with a doubly-stochastic matrix (DSM), which models the probabilities that an annotator will perceive one class and transcribe it into another. We conduct simulations and experiments to show the advantage of PICA compared to two baselines (Majority Vote, and an "unpermutation" heuristic) for three different clustering/labeling tasks. We also explore the conditions under which PICA provides better inference accuracy compared to a simpler but related model based on right-stochastic matrices. Finally, we show that PICA can be used to crowdsource responses for dense image segmentation tasks, and provide a proof-of-concept that aggregating responses in this way could improve the accuracy of this labor-intensive task.

Full Text