Abstract
In natural conversation, turns are handed off quickly, with the mean downtime commonly ranging from 7 to 423 ms. To achieve this, speakers plan their upcoming speech as their partner’s turn unfolds, holding the audible utterance in abeyance until socially appropriate. The role played by prediction is debated, with some researchers claiming that speakers predict upcoming speech opportunities, and others claiming that speakers wait for detection of turn-final cues. The dynamics of articulatory triggering may speak to this debate. It is often assumed that the prepared utterance is held in a response buffer and then initiated all at once. This assumption is consistent with standard phonetic models in which articulatory actions must follow tightly prescribed patterns of coordination. This assumption has recently been challenged by single-word production experiments in which participants partly positioned their articulators to anticipate upcoming utterances, long before starting the acoustic response. The present study considered whether similar anticipatory postures arise when speakers in conversation await their next opportunity to speak. We analyzed a pre-existing audiovisual database of dyads engaging in unstructured conversation. Video motion tracking was used to determine speakers’ lip areas over time. When utterance-initial syllables began with labial consonants or included rounded vowels, speakers produced distinctly smaller lip areas (compared to other utterances), prior to audible speech. This effect was moderated by the number of words in the upcoming utterance; postures arose up to 3,000 ms before acoustic onset for short utterances of 1–3 words. We discuss the implications for models of conversation and phonetic control.
Highlights
Successful spoken communication requires navigating two overlapping sets of temporal constraints
On the Conversation and Speech Postures other hand, there is what might be called situational timing: how phonetic events are timed against the background grid of the environment, including others’ speech
We will argue that understanding interspeaker coordination requires re-evaluating this assumption. Such coordination may arise when speakers apply situational timing mechanisms to aspects of the utterance traditionally viewed as the domain of phonological timing
Summary
Successful spoken communication requires navigating two overlapping sets of temporal constraints. There is what might be called phonological timing: how the flow of articulatory events gives rise to intelligible speech. On the Conversation and Speech Postures other hand, there is what might be called situational timing: how phonetic events are timed against the background grid of the environment, including others’ speech. Situational timing is key to inter-speaker coordination. As we will outline below, most extant speech models assume that phonological and situational timing are governed by distinct cognitive mechanisms. We will argue that understanding interspeaker coordination requires re-evaluating this assumption. Such coordination may arise when speakers apply situational timing mechanisms to aspects of the utterance traditionally viewed as the domain of phonological timing
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.