Abstract

Representation invariance plays a significant role in the performance of deep convolutional neural networks (CNNs) and human visual information processing in various complicated image-based tasks. However, there has been abounding confusion concerning the representation invariance mechanisms of the two sophisticated systems. To investigate their relationship under common conditions, we proposed a representation invariance analysis approach based on data augmentation technology. Firstly, the original image library was expanded by data augmentation. The representation invariances of CNNs and the ventral visual stream were then studied by comparing the similarities of the corresponding layer features of CNNs and the prediction performance of visual encoding models based on functional magnetic resonance imaging (fMRI) before and after data augmentation. Our experimental results suggest that the architecture of CNNs, combinations of convolutional and fully-connected layers, developed representation invariance of CNNs. Remarkably, we found representation invariance belongs to all successive stages of the ventral visual stream. Hence, the internal correlation between CNNs and the human visual system in representation invariance was revealed. Our study promotes the advancement of invariant representation of computer vision and deeper comprehension of the representation invariance mechanism of human visual information processing.

Highlights

  • Deep convolutional neural networks (CNNs) have obtained success in the computer vision domain for their unprecedented applications, but have attracted the attention of workers in the field of psychology and neuroscience

  • Using CNNs for modeling the human brain visual system based on functional magnetic resonance imaging is becoming a bridge connecting artificial intelligence (AI) and human intelligence

  • The structure of CNNs was initially inspired by human visual information processing, which leads to some natural similarities between CNNs and the human visual system [8,9,10]

Read more

Summary

Introduction

Deep convolutional neural networks (CNNs) have obtained success in the computer vision domain for their unprecedented applications, but have attracted the attention of workers in the field of psychology and neuroscience. There has been discussion whether the combination of convolution layers and fully connected layers is an important factor for the transform-invariant property of CNNs. Due to CNNs derived from the hierarchical ventral visual stream [28,29], it is commonly demonstrated that exploring the representation invariance in the visual information processing process of the human brain is practical for understanding the brain and valuable for analyzing the transform-invariant property of CNNs. Several empirical studies have examined whether neurons have lower invariance for progressively perplexing stimulus features in each successive region of interest (ROI) of the ventral visual stream [30,31,32,33,34,35]. The study will be helpful for deeply understanding the invariant representation mechanisms of the brain visual system and CNNs, and promote the development of the invariant representation of computer vision

The fMRI Data
Invariant Representation and Data Augment Technology
Visual Encoding Models Based on fMRI
The Representation Invariance of Human Visual Processing
The Interaction between CNNs and the Human Brain Visual System
Findings
Future Work
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call