Abstract

Human speech has a unique capacity to carry and communicate rich meanings. However, it is not known how the highly dynamic and variable perceptual signal is mapped to existing linguistic and semantic representations. In this novel approach, we used the natural acoustic variability of sounds and mapped them to magnetoencephalography (MEG) data using physiologically-inspired machine-learning models. We aimed at determining how well the models, differing in their representation of temporal information, serve to decode and reconstruct spoken words from MEG recordings in 16 healthy volunteers. We discovered that dynamic time-locking of the cortical activation to the unfolding speech input is crucial for the encoding of the acoustic-phonetic features of speech. In contrast, time-locking was not highlighted in cortical processing of non-speech environmental sounds that conveyed the same meanings as the spoken words, including human-made sounds with temporal modulation content similar to speech. The amplitude envelope of the spoken words was particularly well reconstructed based on cortical evoked responses. Our results indicate that speech is encoded cortically with especially high temporal fidelity. This speech tracking by evoked responses may partly reflect the same underlying neural mechanism as the frequently reported entrainment of the cortical oscillations to the amplitude envelope of speech. Furthermore, the phoneme content was reflected in cortical evoked responses simultaneously with the spectrotemporal features, pointing to an instantaneous transformation of the unfolding acoustic features into linguistic representations during speech processing.

Highlights

  • Humans effortlessly recognize and react to natural sounds but are especially tuned to speech

  • We investigated whether changes in the amplitude envelope of the spoken words, corresponding to the slow temporal modulations within the speech rhythm, are important for the successful decoding of spoken words, by separately decoding the amplitude envelope with the convolution model

  • The results reveal remarkably high classification performance for the spoken words (91%, significant difference from spectrogram decoding for speech, Z = 3.5, p, 0.001; Fig. 1C) and further highlight the importance of the temporal aspects of the stimulus for speech decoding

Read more

Summary

Introduction

Humans effortlessly recognize and react to natural sounds but are especially tuned to speech. Numerous studies have attempted to localize speech-specific processing stages in the brain (Price et al, 2005; Zatorre and Gandour, 2008; Schirmer et al, 2012), but while subtle. Superior temporal cortices show sensitivity to spectrotemporal features of speech that correspond to different phonemes (Chang et al, 2010; Mesgarani et al, 2014), and to temporal structure of speech sounds but not other sounds with similar acoustic content (Overath et al, 2015). Detailed tracking of temporal modulations that distinguish between phonemes may be crucial for speech processing (Poeppel, 2003; Zatorre and Gandour, 2008)

Methods
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.