Abstract

Speech production involves the movement of the mouth and other regions of the face resulting in visual motion cues. These visual cues enhance intelligibility and detection of auditory speech. As such, face-to-face speech is fundamentally a multisensory phenomenon. If speech is fundamentally multisensory, it should be reflected in the evolution of vocal communication: similar behavioral effects should be observed in other primates. Old World monkeys share with humans vocal production biomechanics and communicate face-to-face with vocalizations. It is unknown, however, if they, too, combine faces and voices to enhance their perception of vocalizations. We show that they do: monkeys combine faces and voices in noisy environments to enhance their detection of vocalizations. Their behavior parallels that of humans performing an identical task. We explored what common computational mechanism(s) could explain the pattern of results we observed across species. Standard explanations or models such as the principle of inverse effectiveness and a “race” model failed to account for their behavior patterns. Conversely, a “superposition model”, positing the linear summation of activity patterns in response to visual and auditory components of vocalizations, served as a straightforward but powerful explanatory mechanism for the observed behaviors in both species. As such, it represents a putative homologous mechanism for integrating faces and voices across primates.

Highlights

  • IntroductionOur face moves and deforms the mouth and other regions [1,2,3,4,5]. These dynamics and deformations lead to a variety of visual motion cues (‘‘visual speech’’) related to the auditory components of speech and are integral to face-to-face communication

  • When we speak, our face moves and deforms the mouth and other regions [1,2,3,4,5]

  • Subjects Nonhuman primate subjects were two adult male macaques (Macaca fascicularis). These monkeys were born in captivity and provided various sources of enrichment, including cartoons displayed on a large screen TV as well as olfactory, auditory and visual contact with conspecifics

Read more

Summary

Introduction

Our face moves and deforms the mouth and other regions [1,2,3,4,5]. These dynamics and deformations lead to a variety of visual motion cues (‘‘visual speech’’) related to the auditory components of speech and are integral to face-to-face communication. Real world environments, visual speech can provide considerable intelligibility benefits to the perception of auditory speech [6,7], faster reaction times [8,9], and is hard to ignore—integrating readily and automatically with auditory speech [10]. In the primate lineage, both the number and diversity of muscles innervating the face [12,13,14] and the amount of neural control related to facial movement [15,16,17,18] increased over time relative to other taxa This allowed the production of a greater diversity of facial and vocal expressions in primates [19], with different patterns of facial motion uniquely linked to different vocal expressions [20,21]. For example, coo calls, like the /u/ in speech, are produced with the lips protruded, while screams, like the /i/ in speech, are produced with the lips retracted [20]

Objectives
Methods
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.