Abstract
Understanding how the human brain processes auditory input remains a challenge. Traditionally, a distinction between lower- and higher-level sound features is made, but their definition depends on a specific theoretical framework and might not match the neural representation of sound. Here, we postulate that constructing a data-driven neural model of auditory perception, with a minimum of theoretical assumptions about the relevant sound features, could provide an alternative approach and possibly a better match to the neural responses. We collected electrocorticography recordings from six patients who watched a long-duration feature film. The raw movie soundtrack was used to train an artificial neural network model to predict the associated neural responses. The model achieved high prediction accuracy and generalized well to a second dataset, where new participants watched a different film. The extracted bottom-up features captured acoustic properties that were specific to the type of sound and were associated with various response latency profiles and distinct cortical distributions. Specifically, several features encoded speech-related acoustic properties with some features exhibiting shorter latency profiles (associated with responses in posterior perisylvian cortex) and others exhibiting longer latency profiles (associated with responses in anterior perisylvian cortex). Our results support and extend the current view on speech perception by demonstrating the presence of temporal hierarchies in the perisylvian cortex and involvement of cortical sites outside of this region during audiovisual speech perception.
Highlights
Our understanding of how the human brain processes auditory input remains incomplete
A deep artificial neural networks (ANNs) was trained on the raw soundtrack of the movie to predict the associated ECoG responses in the high frequency band (HFB, 60–95 Hz) [24]
We confirmed that our brain-optimized ANN (BO-NN, Fig 1A) model could be successfully applied to a dataset of different participants watching a different audiovisual film (Movie II)
Summary
Our understanding of how the human brain processes auditory input remains incomplete. Our aim is to identify the features that different cortical regions extract from the incoming sound signal, and to understand how they are transformed into high-level representations specific to sound type (speech, music, noise, etc.). Addressing higher-level features has been attempted in neural encoding models of sound processing [8,9], but higher levels of auditory processing are generally more difficult to model because their characteristics (e.g. in speech or music) remain a topic of theoretical investigation. Higher-level features typically require some form of interpretation and labelling that is based on theoretical constructs and may not match cortical representations. Little is known about the mechanisms underlying the transition from lower- to higher-level auditory processing, leaving these levels of explanation disconnected
Published Version (Free)
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.