Abstract

Attentional selection is a function that allocates the brain’s computational resources to the most important part of a visual scene at a specific moment. Saliency map models have been proposed as computational models to predict attentional selection within a spatial location. Recent saliency map models based on deep convolutional neural networks (DCNNs) exhibit the highest performance for predicting the location of attentional selection and human gaze, which reflect overt attention. Trained DCNNs potentially provide insight into the perceptual mechanisms of biological visual systems. However, the relationship between artificial and neural representations used for determining attentional selection and gaze location remains unknown. To understand the mechanism underlying saliency map models based on DCNNs and the neural system of attentional selection, we investigated the correspondence between layers of a DCNN saliency map model and monkey visual areas for natural image representations. We compared the characteristics of the responses in each layer of the model with those of the neural representation in the primary visual (V1), intermediate visual (V4), and inferior temporal (IT) cortices. Regardless of the DCNN layer level, the characteristics of the responses were consistent with that of the neural representation in V1. We found marked peaks of correspondence between V1 and the early level and higher-intermediate-level layers of the model. These results provide insight into the mechanism of the trained DCNN saliency map model and suggest that the neural representations in V1 play an important role in computing the saliency that mediates attentional selection, which supports the V1 saliency hypothesis.

Highlights

  • Attentional selection enables the brain to allocate its computational resources to the most important part of a visual scene at a specific moment (Posner, 1980) and establish visual perception (Carrasco, 2011; Yang et al, 2018)

  • We found that the characteristics of the responses in the trained deep convolutional neural networks (DCNNs) model for attentional selection were consistent with that of the representation in the primary visual cortex (V1), suggesting that the activities in V1 underlie the neural representations of saliency in the visual field to exogenously guide attentional selection

  • We observed similar results in the analysis of the trained VGG16 model provided by the Chainer framework. These results suggest that the characteristics of the activities in early layers of the VGG16 model trained for object classification were in agreement with that of the neural representation in V1, whereas the responses of model neurons from intermediate to deep layers of the VGG16 exhibited characteristics similar to the neural representation in V4, which implies that the mechanism of the trained DCNN saliency map model might be distinct from that of VGG16 model object classification

Read more

Summary

Introduction

Attentional selection enables the brain to allocate its computational resources to the most important part of a visual scene at a specific moment (Posner, 1980) and establish visual perception (Carrasco, 2011; Yang et al, 2018). Saliency maps have been proposed as a biologically plausible model for predicting attentional selection within the presented visual scene (Itti and Koch, 2000). In this model, the most salient location in a visual scene induces attentional selection. From the original model (Itti et al, 1998), various saliency map models based on the visual system have been proposed (Russell et al, 2014; Wagatsuma, 2019; Uejima et al, 2020) in which the activities of model neurons in early vision are the first, and necessary, process for organizing the saliency map. The crucial role of responses in the primary visual cortex (V1) used for computing visual saliency has been demonstrated by various studies, including physiological, psychophysical, and computational works (V1 saliency hypothesis; Li, 1999a, 2002; Jingling and Zhaoping, 2008; Zhaoping, 2014, 2019; Yan et al, 2018)

Methods
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.