Abstract

We present an approach for detecting rhythmical prominence in read speech. A production experiment was conducted during which subjects repetitively read out speech to a metronome, trying to match stressed syllables to its beat. In the analysis, we compute a function from the speech waveform, related to acoustic properties of speech such as specific loudness, pitch, voicing, and spectral slope. The function is then convolved with a Mexican Hat convolution kernel. Taking large maxima in the function to be predictions of the metronome ticks, we adjust the parameters of the signal to maximize the accuracy of the predictions. The parameters are adjusted by minimizing the phase variation between metronome ticks and ticks predicted from the audio, over a specified time interval. We confirm the results by Bootstrap resampling. We find that the most important factor is the contrast in specific loudness between a syllable and its neighbors. The prominence can be deduced from the specific loudness in an (approximately) 360 ms window centered on the syllable in question relative to an (approximately) 800 ms-wide symmetric window.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call