Abstract

Speech research using lingual ultrasound often requires both the ultrasound images as well as audio from the corresponding audio speech signals, but synchronization of these signals is not always available. We propose that periods of matching rates of change in the two signals could be used to align articulatory and acoustic signals where synchronization is impossible or would otherwise benefit from verification. In this study, we analyze pre-synchronized ultrasound and audio recordings of English speakers reading words. For each recording, we calculated the articulatory change as the change in pixel brightness values over time and the acoustic change as the change in Mel Frequency Cepstral Coefficient representations of the audio, and then calculated the degree of correlation between the acoustic and articulatory change over a window shifting through the recording. Then, we deliberately offset the signals in increments of 5 ms to verify that the known synchronization results in the best correlations. We manipulated several other variables, and preliminary results suggest that shorter window lengths and analyzing correlations only during detected speech result in the most accurate alignments. Analysis is ongoing to determine whether duration of correlation (number of windows with high r-values) or overall degree of correlation (median r-values) leads to the most accurate alignments.Speech research using lingual ultrasound often requires both the ultrasound images as well as audio from the corresponding audio speech signals, but synchronization of these signals is not always available. We propose that periods of matching rates of change in the two signals could be used to align articulatory and acoustic signals where synchronization is impossible or would otherwise benefit from verification. In this study, we analyze pre-synchronized ultrasound and audio recordings of English speakers reading words. For each recording, we calculated the articulatory change as the change in pixel brightness values over time and the acoustic change as the change in Mel Frequency Cepstral Coefficient representations of the audio, and then calculated the degree of correlation between the acoustic and articulatory change over a window shifting through the recording. Then, we deliberately offset the signals in increments of 5 ms to verify that the known synchronization results in the best correlations. We ...

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call