Abstract

While ultrasound imaging has made articulatory phonetics more accessible, quantitative analysis of ultrasound data often reduces speech sounds to tongue contours traced from single video frames, disregarding the temporal aspect of speech. We propose a tracing-free method for directly converting entire ultrasound videos to phonetically interpretable articulatory signals using Principal Component Analysis of image data (Hueber et al. 2007). Once a batch of ultrasound images (e.g., 36,000 frames from 10 min at 60 fps) has been reduced to 20 principal components, numerous techniques are available for deriving temporally changing articulatory signals that are both phonetically meaningful and comparable across speakers. Here we apply a regression model to find the linear combination of PCs that is the lingual articulatory analog of the front diagonal of the acoustic vowel space (Z2-Z1). We demonstrate this technique with a study of /æ/ tensing in 20 speakers of North American English varieties with different tensing environments (Labov 2005). Our results show that /m n/ condition a tongue raising gesture that is aligned to the vowel nucleus, while /ɡ/ conditions anticipatory raising toward the velar target. /ŋ/ patterns consistently with the other velar rather than the other nasals.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call