Phonetic alignment to visual speech

Rachel M Miller,Lawrence D Rosenblum,Kauyumari Sanchez

doi:10.1121/1.4788298

Abstract

Talkers are known to produce allophonic variation based, in part, on the speech of the person with whom they are talking. This subtle imitation, or phonetic alignment, occurs during live conversation and when a talker is asked to shadow recorded words [e.g., Shockley, et al., Percept. Psychophys. 66, 422 (2004)]. What is yet to be determined is the nature of the information to which talkers align. To examine whether this information is restricted to the acoustic modality, experiments were conducted to test if talkers align to visual speech (lipread) information. Normal-hearing subjects were asked to watch an actor silently utter words, and to identify these words by saying them out loud as quickly as possible. These shadowed responses were audio recorded and naive raters compared these responses to the actors auditory words (which had been recorded along with the actors visual tokens). Raters judged the shadowed words as sounding more like the actors words than did baseline words, which had been spoken by subjects before the shadowing task. These results show that phonetic alignment can be based on visual speech, suggesting that its informational basis is not restricted to the acoustic signal.

Full Text