Crowd labeling latent Dirichlet allocation.

Luca Pion-Tonachini,Ken Kreutz-Delgado,Scott Makeig

doi:10.1007/s10115-017-1053-1

Luca Pion-Tonachini, Ken Kreutz-Delgado + Show 1 more

Open Access

https://doi.org/10.1007/s10115-017-1053-1

Copy DOI

Abstract

Large, unlabeled datasets are abundant nowadays, but getting labels for those datasets can be expensive and time-consuming. Crowd labeling is a crowdsourcing approach for gathering such labels from workers whose suggestions are not always accurate. While a variety of algorithms exist for this purpose, we present crowd labeling latent Dirichlet allocation (CL-LDA), a generalization of latent Dirichlet allocation that can solve a more general set of crowd labeling problems. We show that it performs as well as other methods and at times better on a variety of simulated and actual datasets while treating each label as compositional rather than indicating a discrete class. In addition, prior knowledge of workers' abilities can be incorporated into the model through a structured Bayesian framework. We then apply CL-LDA to the EEG independent component labeling dataset, using its generalizations to further explore the utility of the algorithm. We discuss prospects for creating classifiers from the generated labels.

Full Text