Processing communicative facial and vocal cues in the superior temporal sulcus

Ben Deen,Rebecca Saxe,Nancy Kanwisher

doi:10.1016/j.neuroimage.2020.117191

Ben Deen, Rebecca Saxe + Show 1 more

Open Access

https://doi.org/10.1016/j.neuroimage.2020.117191

Copy DOI

Journal: NeuroImage	Publication Date: Jul 23, 2020
Citations: 24	License type: cc-by

Affiliation: McGovern Institute for Brain Research

Abstract

Facial and vocal cues provide critical social information about other humans, including their emotional and attentional states and the content of their speech. Recent work has shown that the face-responsive region of posterior superior temporal sulcus (“fSTS”) also responds strongly to vocal sounds. Here, we investigate the functional role of this region and the broader STS by measuring responses to a range of face movements, vocal sounds, and hand movements using fMRI. We find that the fSTS responds broadly to different types of audio and visual face action, including both richly social communicative actions, as well as minimally social noncommunicative actions, ruling out hypotheses of specialization for processing speech signals, or communicative signals more generally. Strikingly, however, responses to hand movements were very low, whether communicative or not, indicating a specific role in the analysis of face actions (facial and vocal), not a general role in the perception of any human action. Furthermore, spatial patterns of response in this region were able to decode communicative from noncommunicative face actions, both within and across modality (facial/vocal cues), indicating sensitivity to an abstract social dimension. These functional properties of the fSTS contrast with a region of middle STS that has a selective, largely unimodal auditory response to speech sounds over both communicative and noncommunicative vocal nonspeech sounds, and nonvocal sounds. Region of interest analyses were corroborated by a data-driven independent component analysis, identifying face-voice and auditory speech responses as dominant sources of voxelwise variance across the STS. These results suggest that the STS contains separate processing streams for the audiovisual analysis of face actions and auditory speech processing.

Highlights

We learn a great deal about the character, thoughts, and emotions of another person by watching their face and listening to their voice
Considering the response profiles of the fSTS and vSTS together, our results indicate that the superior temporal sulcus (STS) contains distinct pathways for 1) processing of facial and vocal signals in general, and 2) processing of speech signals
While we designate the regions studied here as fSTS and vSTS based on the functional criteria used to define them, these results suggest that fvSTS and spSTS would be more appropriate names

Summary

Introduction

We learn a great deal about the character, thoughts, and emotions of another person by watching their face and listening to their voice. Within the posterior STS (pSTS), neuroimaging studies have reliably observed visual responses to perceived face movements (Allison et al, 2000; Bernstein et al, 2018; Pelphrey et al, 2005; Pitcher et al, 2011; Puce et al, 1998; Schultz et al, 2013), and spatial patterns of response that discriminate types of face movement (Deen and Saxe, 2019; Said et al, 2010; Srinivasan et al, 2016) These observations have led to the hypothesis that the STS contains a dorsal stream for face processing, specialized for extracting dynamic information from face motion, and distinct from a static form pathway on the ventral surface (Bernstein and Yovel, 2015; Freiwald et al, 2016)

Methods

Results

Conclusion