Abstract

Prosody is the supra-segmental aspect of speech that helps to convey the structure and intended meaning of lexical content unambiguously. The automatic detection of prosodic events, such as phrase boundary and word prominence, has a number of applications in discourse analysis, where a combination of syntactic and acoustic-prosodic features is typically employed. This work addresses prosodic event detection in the context of assessing oral reading skills of middle-school children. We discuss the observed characteristics of a specially created labeled data set of oral reading recordings of English stories by non-native speakers. The obtained diversity of language skills adds to the known challenges of high speaker variability in the acoustic realization of prosodic events. A combination of knowledge- and data-driven feature selection is implemented to identify a compact set of word-level features from the acoustic correlates of prosody considering different ways of incorporating the necessary temporal context. The system is benchmarked with reference to a widely known prosodic event recognition system in a speaker-independent set-up to obtain a competitive performance with greatly reduced feature dimensionality. The interpretable features enable us to use the predictor model importance scores to identify high-level speaker traits that influence the acoustic realization of prosodic events, suggesting a potential extension to systems that can extract and utilize speaker idiosyncrasies for superior prosodic event detection.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call