Abstract

Recognizing sounds implicates the cerebral transformation of input waveforms into semantic representations. Although past research identified the superior temporal gyrus (STG) as a crucial cortical region, the computational fingerprint of these cerebral transformations remains poorly characterized. Here, we contrasted the ability of pre-published acoustic, semantic, and sound-to-event deep neural network (DNN) models to account for the behavioral estimate of the perceived dissimilarity, and for the 7T fMRI responses to natural sounds in the absence of explicit task demands. We find that both perceived dissimilarity and STG fMRI responses are better predicted by sound-to-event DNNs, and, within the DNNs, by the layers intermediate between the input acoustic representation and the output semantic embedding. Our findings indicate that STG entails intermediate acoustic-to-semantic sound representations that neither acoustic nor semantic models can account for. These representations are compositional in nature and relevant to behavior.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call