Abstract

Our ability to perceive a stable visual world in the presence of continuous movements of the body, head, and eyes has puzzled researchers in the neuroscience field for a long time. We reformulated this problem in the context of hierarchical convolutional neural networks (CNNs)—whose architectures have been inspired by the hierarchical signal processing of the mammalian visual system—and examined perceptual stability as an optimization process that identifies image-defining features for accurate image classification in the presence of movements. Movement signals, multiplexed with visual inputs along overlapping convolutional layers, aided classification invariance of shifted images by making the classification faster to learn and more robust relative to input noise. Classification invariance was reflected in activity manifolds associated with image categories emerging in late CNN layers and with network units acquiring movement-associated activity modulations as observed experimentally during saccadic eye movements. Our findings provide a computational framework that unifies a multitude of biological observations on perceptual stability under optimality principles for image classification in artificial neural networks.

Highlights

  • When reading this paper while sitting still at your desk, unperceived head and body adjustments, along with continuous eye movements—fixational eye movements [1]—jitter the visual image across arrays of photoreceptors in the retinas of the eyes

  • We explore the hypothesis that perception equates to the activity states of networks trained to classify “features” in the visual scene, and perceptual stability equates to robust classification of these features relative to self-generated movements, that is, a “what” type of information processing

  • We demonstrate in convolutional neural networks (CNNs) that neural signals related to eye and body movements support accurate image classification by making “where” type of computations—localization invariances— faster to learn and more robust relative to input perturbations

Read more

Summary

Introduction

When reading this paper while sitting still at your desk, unperceived head and body adjustments, along with continuous eye movements—fixational eye movements [1]—jitter the visual image across arrays of photoreceptors in the retinas of the eyes. A branch of modeling works has linked the ability to accurately recognize objects during movements—which could support perceptual stability—to invariances for translations, rotations, and expansions learned directly from the statistics of the visual inputs This class of models, e.g., unsupervised temporal learning models [7,8] and slow feature analysis models [9–14] has found supporting evidence in psychophysical [15,16] and physiological studies [7,17,18] and has inspired deep learning approaches for unsupervised rules to learn coherent visual representations in the presence of moving stimuli e.g., contrastive embedding [19,20]. These models are agnostic with respect to whether retinal activations are due to objects moving in the environment or to movements of the organism, with the latter characteristically defining the phenomenon of perceptual stability. Another branch of works has hypothesized that extra-retinal signals produced during body movements, corollary discharges [21–24], could be used by brain networks for perceptual stabilization when retinal activations are due to the movements of the eyes, head, and body, without affecting the percept of movements during changes in the environment

Methods
Results
Discussion
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.