Abstract

The human superior temporal sulcus (STS) is responsive to visual and auditory information, including sounds and facial cues during speech recognition. We investigated the functional organization of STS with respect to modality-specific and multimodal speech representations. Twenty younger adult participants were instructed to perform an oddball detection task and were presented with auditory, visual, and audiovisual speech stimuli, as well as auditory and visual nonspeech control stimuli in a block fMRI design. Consistent with a hypothesized anterior-posterior processing gradient in STS, auditory, visual and audiovisual stimuli produced the largest BOLD effects in anterior, posterior and middle STS (mSTS), respectively, based on whole-brain, linear mixed effects and principal component analyses. Notably, the mSTS exhibited preferential responses to multisensory stimulation, as well as speech compared to nonspeech. Within the mid-posterior and mSTS regions, response preferences changed gradually from visual, to multisensory, to auditory moving posterior to anterior. Post hoc analysis of visual regions in the posterior STS revealed that a single subregion bordering the mSTS was insensitive to differences in low-level motion kinematics yet distinguished between visual speech and nonspeech based on multi-voxel activation patterns. These results suggest that auditory and visual speech representations are elaborated gradually within anterior and posterior processing streams, respectively, and may be integrated within the mSTS, which is sensitive to more abstract speech information within and across presentation modalities. The spatial organization of STS is consistent with processing streams that are hypothesized to synthesize perceptual speech representations from sensory signals that provide convergent information from visual and auditory modalities.

Highlights

  • IntroductionThe superior temporal sulcus (STS) is activated during a variety of perceptual tasks including audiovisual integration (Beauchamp et al, 2004b; Amedi et al, 2005), speech perception (Binder et al, 2000, 2008; Hickok and Poeppel, 2004, 2007; Price, 2010), and biological motion perception (Allison et al, 2000; Grossman et al, 2000, 2005; Grossman and Blake, 2002; Beauchamp et al, 2003; Puce and Perrett, 2003)

  • It has been widely established that auditory speech perception is Auditory, Visual Speech Processing Streams influenced by visual speech information (Sumby and Pollack, 1954; McGurk and MacDonald, 1976; Dodd, 1977; Reisberg et al, 1987; Callan et al, 2003), which is represented in part within biological motion circuits that specify the shape and position of vocal tract articulators

  • In the present fMRI study, we set out to answer two questions concerning the organization of multisensory speech streams in the superior temporal sulcus (STS): (1) Does activation follow a posterior-to-anterior gradient from facial motion processing regions, to multisensory speech regions, to auditory regions? And if so (2) where along this gradient do speech-specific representations emerge in the STS; in particular, do posterior-visual regions of the STS play a role in speech processing? To answer these questions we presented participants with a variety of speech and nonspeech conditions: auditory speech, visual speech, audiovisual speech, spectrally rotated speech and nonspeech facial gestures

Read more

Summary

Introduction

The superior temporal sulcus (STS) is activated during a variety of perceptual tasks including audiovisual integration (Beauchamp et al, 2004b; Amedi et al, 2005), speech perception (Binder et al, 2000, 2008; Hickok and Poeppel, 2004, 2007; Price, 2010), and biological motion perception (Allison et al, 2000; Grossman et al, 2000, 2005; Grossman and Blake, 2002; Beauchamp et al, 2003; Puce and Perrett, 2003). It has been widely established that auditory speech perception is Auditory, Visual Speech Processing Streams influenced by visual speech information (Sumby and Pollack, 1954; McGurk and MacDonald, 1976; Dodd, 1977; Reisberg et al, 1987; Callan et al, 2003), which is represented in part within biological motion circuits that specify the shape and position of vocal tract articulators This high-level visual information is hypothesized to interact with auditory speech representations in the STS (Callan et al, 2003). Human functional neuroimaging evidence supports the notion that the STS is a multisensory convergence zone for speech (Calvert et al, 2000; Wright et al, 2003; Beauchamp et al, 2004a, 2010; Szycik et al, 2008; Stevenson and James, 2009; Stevenson et al, 2010, 2011; Nath and Beauchamp, 2011, 2012)

Methods
Results
Discussion
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.