Abstract

Nowadays, deep neural networks (DNNs) have been equipped with powerful representation capabilities. The deep convolutional neural networks (CNNs) that draw inspiration from the visual processing mechanism of the primate early visual cortex have outperformed humans on object categorization and have been found to possess many brain-like properties. Recently, vision transformers (ViTs) have been striking paradigms of DNNs and have achieved remarkable improvements on many vision tasks compared to CNNs. It is natural to ask how the brain-like properties of ViTs are. Beyond the model paradigm, we are also interested in the effects of factors, such as model size, multimodality, and temporality, on the ability of networks to model the human visual pathway, especially when considering that existing research has been limited to CNNs. In this paper, we systematically evaluate the brain-like properties of 30 kinds of computer vision models varying from CNNs and ViTs to their hybrids from the perspective of explaining brain activities of the human visual cortex triggered by dynamic stimuli. Experiments on two neural datasets demonstrate that neither CNN nor transformer is the optimal model paradigm for modelling the human visual pathway. ViTs reveal hierarchical correspondences to the visual pathway as CNNs do. Moreover, we find that multi-modal and temporal networks can better explain the neural activities of large parts of the visual cortex, whereas a larger model size is not a sufficient condition for bridging the gap between human vision and artificial networks. Our study sheds light on the design principles for more brain-like networks. The code is available at https://github.com/QYiZhou/LWNeuralEncoding.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.