Abstract

Computational visual encoding models play a key role in understanding the stimulus–response characteristics of neuronal populations in the brain visual cortex. However, building such models typically faces challenges in the effective construction of non-linear feature spaces to fit the neuronal responses. In this work, we propose the GaborNet visual encoding (GaborNet-VE) model, a novel end-to-end encoding model for the visual ventral stream. This model comprises a Gabor convolutional layer, two regular convolutional layers, and a fully connected layer. The key design principle for the GaborNet-VE model is to replace regular convolutional kernels in the first convolutional layer with Gabor kernels with learnable parameters. One GaborNet-VE model efficiently and simultaneously encodes all voxels in one region of interest of functional magnetic resonance imaging data. The experimental results show that the proposed model achieves state-of-the-art prediction performance for the primary visual cortex. Moreover, the visualizations demonstrate the regularity of the region of interest fitting to the visual features and the estimated receptive fields. These results suggest that the lightweight region-based GaborNet-VE model based on combining handcrafted and deep learning features exhibits good expressiveness and biological interpretability.

Highlights

  • Human and primate visual systems are exceedingly adept at achieving complicated vision tasks based on rudimentary visual perception

  • Qiao et al (2019) designed an end-to-end convolutional neural networks (CNNs) regression (ETECR) model for visual encoding based on functional magnetic resonance imaging (fMRI) data. This model combined linear and non-linear mapping components into the CNN architecture, which was trained by experimental stimuli and corresponding fMRI signals

  • In comparison with three reference models, our model achieved state-of-the-art prediction performance for the primary visual cortex and comparable prediction performance for the intermediate and higher cortex areas

Read more

Summary

INTRODUCTION

Human and primate visual systems are exceedingly adept at achieving complicated vision tasks based on rudimentary visual perception. Kay et al (2008) proposed the Gabor wavelet pyramid (GWP) visual encoding model that consists of an over-complete basis of phase-invariant Gabor wavelets with different positions, orientations, and spatial frequencies This model processes visual stimuli to generate a non-linear feature space with good expressiveness and interpretability. This model combined linear and non-linear mapping components into the CNN architecture, which was trained by experimental stimuli and corresponding fMRI signals This model could learn optimal feature representations and linear regression weights for visual cortical responses and achieve major improvements in prediction performance. The voxels in V1 and V2 prefer Gabor kernels with high spatial frequencies, whereas the voxels in V4 and LO prefer Gabor kernels with low spatial frequencies, special voxels with opposite properties exist in all areas These results suggest that the lightweight GaborNet-VE model based on combining handcrafted and deep ROI features has both good expressiveness and biological interpretability. Our work connects deep learning with neuroscience and promotes the development of artificial intelligence and the understanding of human intelligence

MATERIALS AND METHODS
RESULTS
DISCUSSION
Limitations and Future
CONCLUSION
ETHICS STATEMENT
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call