Crowdsourcing speech intelligibility judgments

Maria K Wolters,Karl B Isaac

doi:10.1121/1.4988818

Abstract

When we crowdsource judgements about the intelligibility of speech stimuli, the results are similar to those obtained under laboratory conditions, but contain substantially more noise. In this talk, we report on the extent to which self-reported variables such as background noise, headphone type, and hearing ability explain some of this variation in listener judgements. Our data comes from a total of 7 data sets, 4 studies conducted for a PhD thesis (Isaac, 2015; n = 276) and 3 data sets collected from Mechanical Turk for the Blizzard Challenges in 2009, 2010, and 2011 (King and Karaiskos 2009, 2010, 2011; n = 247). The statistical analysis, using generalised linear mixed models, focuses on two outcome variables, word error rate (WER) and perceived difficulty in understanding the sentences. We will discuss the implications of our findings for the design and analysis of large-scale crowdsourced speech intelligibility studies. This discussion will be framed with reference to current best practice in crowdsourcing perception studies.

Full Text