Abstract

Speech signals have a unique shape of long-term modulation spectrum that is distinct from environmental noise, music, and non-speech vocalizations. Does the human auditory system adapt to the speech long-term modulation spectrum and efficiently extract critical information from speech signals? To answer this question, we tested whether neural responses to speech signals can be captured by specific modulation spectra of non-speech acoustic stimuli. We generated amplitude modulated (AM) noise with the speech modulation spectrum and 1/f modulation spectra of different exponents to imitate temporal dynamics of different natural sounds. We presented these AM stimuli and a 10-min piece of natural speech to 19 human participants undergoing electroencephalography (EEG) recording. We derived temporal response functions (TRFs) to the AM stimuli of different spectrum shapes and found distinct neural dynamics for each type of TRFs. We then used the TRFs of AM stimuli to predict neural responses to the speech signals, and found that (1) the TRFs of AM modulation spectra of exponents 1, 1.5, and 2 preferably captured EEG responses to speech signals in the δ band and (2) the θ neural band of speech neural responses can be captured by the AM stimuli of an exponent of 0.75. Our results suggest that the human auditory system shows specificity to the long-term modulation spectrum and is equipped with characteristic neural algorithms tailored to extract critical acoustic information from speech signals.

Highlights

  • Sensory systems evolve to adapt to environmental statistics and to efficiently extract features in natural stimuli essential to animals’ survival (Barlow, 1961)

  • We generated amplitude modulated (AM) sounds (AM stimuli) with various shapes of long-term modulation spectra to emulate temporal dynamics of natural sounds and investigated how the neural signatures to different modulation spectra can be employed to predict the neural responses to speech signals using an encoding framework

  • We showed that the neural responses to speech signals can be predicted by the encoding models derived from the modulation spectra similar to the speech modulation spectra in the d and u bands (Figs. 3, 4)

Read more

Summary

Introduction

Sensory systems evolve to adapt to environmental statistics and to efficiently extract features in natural stimuli essential to animals’ survival (Barlow, 1961). One acoustic feature that differentiates speech from other natural sounds is longterm modulation spectrum (Ding et al, 2017). Natural sounds, such as environmental noise, speech, music, and some vocalizations, often have broadband modulation spectra that show a 1/f pattern with its exponent indicating how sounds are modulated across various timescales (Voss and Clarke, 1978; Theunissen and Elie, 2014). Compared with environmental noise and some vocalizations, speech has a unique modulation spectrum with an exponent of frequency between 1 and 1.5 (Singh and Theunissen, 2003) and a prominent peak around 4 Hz (Ding et al, 2017; Varnet et al, 2017). Does the human auditory system show sensitivity to the specific shape of speech long-term modulation spectrum?

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call