Abstract

Irregular pitch periods (IPPs) occur in a wide variety of speech contexts and can support automatic speech recognition systems by signaling word boundaries, phrase endings, and certain prosodic contours. IPPs can also provide information about emotional content, dialect, and speaker identity. The ability to automatically detect IPPs is particularly useful because accurately identifying IPPs by hand is time-consuming and requires expertise. In this project, we use an algorithm developed for creaky voice analysis by Kane et al. (2013) incorporating features from Ishi et al. (2008) to automatically identify IPPs in recordings of speech from the American English Map Task database. Short-term power, intra-frame periodicity, inter-pulse similarity, subharmonic energy, and glottal pulse peakiness measures are input into an artificial neural network to generate frame-by-frame creak probabilities. To determine a perceptually relevant threshold probability, the creak probabilities are compared to IPPs hand-labeled by experienced raters. Preliminary results yielded an area under the receiver operating characteristic curve of 0.82. Thresholds above 0.1 produced very high specificity, but even lower thresholds yielded fairly high sensitivity and specificity. These results indicate generally good agreement between hand-labeled IPPs and automatic detection, calling for future work investigating effects of linguistic and prosodic context.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call