When we listen to speech, our brain's neurophysiological responses "track" its acoustic features, but it is less well understood how these auditory responses are enhanced by linguistic content. Here, we recorded magnetoencephalography (MEG) responses while subjects of both sexes listened to four types of continuous-speech-like passages: speech-envelope modulated noise, English-like non-words, scrambled words, and a narrative passage. Temporal response function (TRF) analysis provides strong neural evidence for the emergent features of speech processing in cortex, from acoustics to higher-level linguistics, as incremental steps in neural speech processing. Critically, we show a stepwise hierarchical progression of progressively higher order features over time, reflected in both bottom-up (early) and top-down (late) processing stages. Linguistically driven top-down mechanisms take the form of late N400-like responses, suggesting a central role of predictive coding mechanisms at multiple levels. As expected, the neural processing of lower-level acoustic feature responses is bilateral or right lateralized, with left lateralization emerging only for lexical-semantic features. Finally, our results identify potential neural markers, linguistic level late responses, derived from TRF components modulated by linguistic content, suggesting that these markers are indicative of speech comprehension rather than mere speech perception.Significance Statement We investigate neural processing mechanisms as speech evolves from acoustic signals to meaningful language, using stimuli ranging from without any linguistic information to fully well-formed linguistic content. Computational models based on speech and linguistic hierarchy reveal that cortical responses time-lock to emergent features from acoustics to linguistic processes at the sentence level, with increasing the semantic information in the acoustic input. Temporal response functions (TRFs) uncovered millisecond-level processing dynamics as speech and language stages unfold. Each speech feature undergoes early and late processing stages, with the former driven by bottom-up activation and the latter influenced by top-down mechanisms. These insights enhance our understanding of the hierarchical nature of auditory language processing.
Read full abstract