Crowdsourcing to Assess Speech Quality Associated With Velopharyngeal Dysfunction.

Kaylee Paulsgrove,Kristy Seidel,Erin Miller,Sara Kinter,Raymond Tse

doi:10.1177/1055665620948770

Abstract

To assess crowdsourced responses in the evaluation of speech outcomes in children with velopharyngeal dysfunction (VPD). Fifty deidentified speech samples were compiled. Multiple pairwise comparisons obtained by crowdsourcing were used to produce a rank order of speech quality. Ratings of overall and specific speech characteristics were also collected. Twelve speech-language pathologists (SLPs) who specialize in VPD were asked to complete the same tasks. Crowds and experts completed each task on 2 separate occasions at least 1 week apart. On-line crowdsourcing platform. Crowdsource raters were anonymous and at least 18 years of age, North American English speakers with self-reported normal hearing. Speech-language pathologists were recruited from multiple cleft/craniofacial teams. None. Correlation of repeated assessments and comparison of crowd and SLP assessments. We obtained 6331 lay person assessments that met inclusion criteria via crowdsourcing within 8 hours. The crowds provided reproducible Elo rankings of speech quality, ρ(48) = .89; P <.0001, and consistent ratings of intelligibility and acceptability (intraclass correlation coefficient [ICC] = .87 and .92) on repeated assessments. There was a significant correlation of those crowd rankings, ρ(10) = .86; P = .0003, and ratings (ICC = .75 and .79) with those of SLPs. The correlation of more specific speech characteristics by the crowds and SLPs was moderate to weak (ICC < 0.65). Crowdsourcing shows promise as a rapid way to obtain large numbers of speech assessments. Reliability of repeated assessments was acceptable. Large groups of naive raters yield comparable evaluations of overall speech acceptability, intelligibility, and quality, but are not consistent with expert raters for specific speech characteristics such as resonance and nasal air emission.

Full Text