Abstract

While abundant in biology, foveated vision is nearly absent from computational models and especially deep learning architectures. Despite considerable hardware improvements, training deep neural networks still presents a challenge and constraints complexity of models. Here we propose an end-to-end neural model for foveal-peripheral vision, inspired by retino-cortical mapping in primates and humans. Our model has an efficient sampling technique for compressing the visual signal such that a small portion of the scene is perceived in high resolution while a large field of view is maintained in low resolution. An attention mechanism for performing “eye-movements” assists the agent in collecting detailed information incrementally from the observed scene. Our model achieves comparable results to a similar neural architecture trained on full-resolution data for image classification and outperforms it at video classification tasks. At the same time, because of the smaller size of its input, it can reduce computational effort tenfold and uses several times less memory. Moreover, we present an easy to implement bottom-up and top-down attention mechanism which relies on task-relevant features and is therefore a convenient byproduct of the main architecture. Apart from its computational efficiency, the presented work provides means for exploring active vision for agent training in simulated environments and anthropomorphic robotics.

Highlights

  • The biological visual system has served as a template and inspiration in Computer Vision in many ways

  • In order to efficiently guide eye movements, we propose an easy to implement attention mechanism based on feature saliency that can be used in a bottom-up manner to detect salient objects or top-down for locating or tracking a specific object by the agent

  • In this paper we proposed a new deep learning vision model inspired by the structural organization of the primate retina

Read more

Summary

Introduction

The biological visual system has served as a template and inspiration in Computer Vision in many ways. Improvements in hardware allowed Deep Learning and Convolutional Neural Networks (CNNs) to gain a lot of interest from the research community, which dominate vision models in this field (O’Mahony et al, 2019). The layered structure and information processing mechanisms, that CNNs rely on, resemble more closely biological systems than previously used machine learning approaches. Despite the success story of modelling the biological visual system by deep neural networks, there are several qualitative differences in properties as well as in performance.

Methods
Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call