Abstract

Searching for and recognizing objects in complex natural scenes is implemented by multiple saccades until the eyes reach within the reduced receptive field sizes of inferior temporal cortex (IT) neurons. We analyze and model how the dorsal and ventral visual streams both contribute to this. Saliency detection in the dorsal visual system including area LIP is modeled by graph-based visual saliency, and allows the eyes to fixate potential objects within several degrees. Visual information at the fixated location subtending approximately 9° corresponding to the receptive fields of IT neurons is then passed through a four layer hierarchical model of the ventral cortical visual system, VisNet. We show that VisNet can be trained using a synaptic modification rule with a short-term memory trace of recent neuronal activity to capture both the required view and translation invariances to allow in the model approximately 90% correct object recognition for 4 objects shown in any view across a range of 135° anywhere in a scene. The model was able to generalize correctly within the four trained views and the 25 trained translations. This approach analyses the principles by which complementary computations in the dorsal and ventral visual cortical streams enable objects to be located and recognized in complex natural scenes.

Highlights

  • One of the major problems that is solved by the visual system in the cerebral cortex is the building of a representation of visual information that allows object and face recognition to occur relatively independently of size, contrast, spatial frequency, position on the retina, angle of view, lighting, etc

  • The research described here emphasizes that because the eyes do not locate the center of objects based on saliency, translation invariance as well as view, size etc invariance needs to be implemented in the ventral visual system

  • We show how a model of invariant object recognition in the ventral visual system, VisNet, can perform the required combination of translation and view invariant recognition, and can generalize between views of objects that are 45◦ apart during training, and can generalize to intermediate locations when trained in a coarse training grid with the spacing between trained locations equivalent to 1–3◦

Read more

Summary

Introduction

One of the major problems that is solved by the visual system in the cerebral cortex is the building of a representation of visual information that allows object and face recognition to occur relatively independently of size, contrast, spatial frequency, position on the retina, angle of view, lighting, etc. One mechanism that the brain uses to simplify the task of recognizing objects in complex natural scenes is that the receptive fields of inferior temporal cortex neurons change from approximately 70◦ in diameter when tested under classical neurophysiology conditions with a single stimulus on a blank screen to as little as a radius of 8◦ (for a 5◦ stimulus) when tested in a complex natural scene (Rolls et al, 2003; Aggelopoulos and Rolls, 2005) (with consistent findings described by Sheinberg and Logothetis, 2001) This greatly simplifies the task for the object recognition system, for instead of dealing with the whole scene as in traditional computer vision approaches, the brain processes just a small fixated region of a complex natural scene at any one time, and the eyes are moved to another part of the screen. The inferior temporal cortex neurons respond to the object being fixated with view, size, and rotation invariance (Rolls, 2012), and need some translation invariance, for the eyes may not be fixating the center of the object when the inferior temporal cortex neurons respond (Rolls et al, 2003)

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call