Abstract

Convolutional neural networks (CNNs) are increasingly used to model human vision due to their high object categorization capabilities and general correspondence with human brain responses. Here we evaluate the performance of 14 different CNNs compared with human fMRI responses to natural and artificial images using representational similarity analysis. Despite the presence of some CNN-brain correspondence and CNNs’ impressive ability to fully capture lower level visual representation of real-world objects, we show that CNNs do not fully capture higher level visual representations of real-world objects, nor those of artificial objects, either at lower or higher levels of visual representations. The latter is particularly critical, as the processing of both real-world and artificial visual stimuli engages the same neural circuits. We report similar results regardless of differences in CNN architecture, training, or the presence of recurrent processing. This indicates some fundamental differences exist in how the brain and CNNs represent visual information.

Highlights

  • Convolutional neural networks (CNNs) are increasingly used to model human vision due to their high object categorization capabilities and general correspondence with human brain responses

  • Following Khaligh-Razavi and Kriegeskorte[8] and Cichy et al.[5] and using the lower bound of the noise ceiling from the human brain data as our threshold, we examined how well visual representational structures in the human brain may be captured by CNNs, with “fully capture” meaning that the brain-CNN correlation would be as good as the brain-brain correlation between the human participants, which in turn would indicate that CNN is able to fully account for the total amount of explainable brain variance

  • Because the RSA approach allows easy comparisons of multiple functional magnetic resonance imaging (fMRI) data sets with multiple CNNs, and because a noise ceiling can be derived to quantify the degree of the brain–CNN correspondence, we used this approach in the present study

Read more

Summary

Introduction

Convolutional neural networks (CNNs) are increasingly used to model human vision due to their high object categorization capabilities and general correspondence with human brain responses. We evaluate the performance of 14 different CNNs compared with human fMRI responses to natural and artificial images using representational similarity analysis. We report similar results regardless of differences in CNN architecture, training, or the presence of recurrent processing. This indicates some fundamental differences exist in how the brain and CNNs represent visual information. We reevaluate the key fMRI finding showing that representations formed in lower and higher layers of the CNN could track those of the human lower and higher visual processing regions, respectively. While this is a valid approach, it is computationally costly and requires large amounts of training data to map a large number of fMRI voxels to an even larger number of CNN units

Objectives
Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call