Abstract

Much of what is known about prosody is based on native-speaker intuitions of idealized speech, or on prosodic annotations from expert annotators trained to interpret a visual display of f0. These approaches have been deployed to study prosody primarily in languages accessible to university researchers, and largely based on small, homogenous speech samples from college-aged adult speakers. We describe an alternative approach, with coarse-grained annotations collected from a cohort of untrained annotators performing real-time Rapid Prosody Transcription (RPT) using LMEDS, an open-source software tool we developed to enable large-scale, crowd-sourced prosodic annotation over the internet. We compared nearly 100 lab-based and crowd-sourced RPT annotations for a 300-word, multi-talker sample of conversational American English, with annotators from the same (US) vs. different (Indian) dialect groups. Results show greater inter-annotator agreement for same-dialect annotators, and the best overall reliability from crowd-sourced US annotators. Statistical models show that a common set of acoustic and contextual factors predict prominence and boundary labels for all annotator groups. Overall, crowd-sourced prosodic annotation is shown to be efficient, and to rely on established cues to prosody, supporting its use for prosody research across languages, dialects, speaker populations, and speech genres.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.