Abstract

Decades of research on visual information for speech has demonstrated the informativeness of visual cues to enhance (Cho et al., 2020; Kawase et al., 2014) and influence (MacDonald and McGurk, 1978) speech perception. However, it is unclear which specific visual cues are spatially and temporally correlated to certain features of speech segments. This study explored visual cues of voicing during the production of Canadian English stops using facial recognition technology (Baltrušaitis et al., 2018) and manual coding (using ELAN 2022). We recorded the audio along with two videos, capturing both front and side views simultaneously, from six native Canadian English speakers. We paid special attention to the throat (larynx), chin, and neck areas which have been understudied by the previous literature. Preliminary data shows expanding movement in the submental triangle and throat during the production of voiced stops compared to voiceless stops. This finding supports tongue body lowering and larynx lowering found in the production of English voiced stops (Westbury, 1983). The comparison between utterance initial stops with post-vocalic stops shows that certain visual cues may be related to phonological voicing categorization irrespective of actual voicing during closure, while others reflect phonetic voicing reality.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call