Abstract

Cortex in and around the human posterior superior temporal sulcus (pSTS) is known to be critical for speech perception. The pSTS responds to both the visual modality (especially biological motion) and the auditory modality (especially human voices). Using fMRI in single subjects with no spatial smoothing, we show that visual and auditory selectivity are linked. Regions of the pSTS were identified that preferred visually presented moving mouths (presented in isolation or as part of a whole face) or moving eyes. Mouth-preferring regions responded strongly to voices and showed a significant preference for vocal compared with nonvocal sounds. In contrast, eye-preferring regions did not respond to either vocal or nonvocal sounds. The converse was also true: regions of the pSTS that showed a significant response to speech or preferred vocal to nonvocal sounds responded more strongly to visually presented mouths than eyes. These findings can be explained by environmental statistics. In natural environments, humans see visual mouth movements at the same time as they hear voices, while there is no auditory accompaniment to visual eye movements. The strength of a voxel's preference for visual mouth movements was strongly correlated with the magnitude of its auditory speech response and its preference for vocal sounds, suggesting that visual and auditory speech features are coded together in small populations of neurons within the pSTS.SIGNIFICANCE STATEMENT Humans interacting face to face make use of auditory cues from the talker's voice and visual cues from the talker's mouth to understand speech. The human posterior superior temporal sulcus (pSTS), a brain region known to be important for speech perception, is complex, with some regions responding to specific visual stimuli and others to specific auditory stimuli. Using BOLD fMRI, we show that the natural statistics of human speech, in which voices co-occur with mouth movements, are reflected in the neural architecture of the pSTS. Different pSTS regions prefer visually presented faces containing either a moving mouth or moving eyes, but only mouth-preferring regions respond strongly to voices.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.