Crowd-sourcing prosodic annotation

Jennifer Cole,Joseph Roy,Timothy Mahrt

doi:10.1121/1.4988814

Abstract

Much of what is known about prosody is based on native-speaker intuitions of idealized speech, or on prosodic annotations from expert annotators trained to interpret a visual display of f0. These approaches have been deployed to study prosody primarily in languages accessible to university researchers, and largely based on small, homogenous speech samples from college-aged adult speakers. We describe an alternative approach, with coarse-grained annotations collected from a cohort of untrained annotators performing real-time Rapid Prosody Transcription (RPT) using LMEDS, an open-source software tool we developed to enable large-scale, crowd-sourced prosodic annotation over the internet. We compared nearly 100 lab-based and crowd-sourced RPT annotations for a 300-word, multi-talker sample of conversational American English, with annotators from the same (US) vs. different (Indian) dialect groups. Results show greater inter-annotator agreement for same-dialect annotators, and the best overall reliability from crowd-sourced US annotators. Statistical models show that a common set of acoustic and contextual factors predict prominence and boundary labels for all annotator groups. Overall, crowd-sourced prosodic annotation is shown to be efficient, and to rely on established cues to prosody, supporting its use for prosody research across languages, dialects, speaker populations, and speech genres.

Full Text