Abstract

We present an investigation into adopting a model of the retino-cortical mapping, found in biological visual systems, to improve the efficiency of image analysis using Deep Convolutional Neural Nets (DCNNs) in the context of robot vision and egocentric perception systems. This work has now enabled DCNNs to process input images approaching one million pixels in size, in real time, using only consumer grade graphics processor (GPU) hardware in a single pass of the DCNN.

Highlights

  • Deep Learning methods have revolutionised signal and image analysis, and end-to-end approaches to training these networks can achieve the state-of-the-art in vision-based control (Viereck et al, 2017; Morrison et al, 2018) and recognition for robotics

  • In order to evaluate the performance of the retinal subsampling mechanism and the cortical image representation in isolation, three Deep Convolutional Neural Nets (DCNNs) were trained, each with the same architecture but each using a different subset of the dataset built in the previous section

  • We have confirmed the utility of the functional architecture of the human visual pathway, as predicted by Schwartz and others, by investigating retino-cortical mapping models within implementations of computer vision systems based on Deep Learning

Read more

Summary

Introduction

Deep Learning methods have revolutionised signal and image analysis, and end-to-end approaches to training these networks can achieve the state-of-the-art in vision-based control (Viereck et al, 2017; Morrison et al, 2018) and recognition for robotics. A real obstacle to the practical adoption of DCNNs is their requirement for very large training data sets and their inability to scale to process image matrices of greater than approximately 300 × 300 px in a single pass We address this issue directly by adopting a computational model of the space-variant, i.e., foveated, visual processing architecture found in the mammalian vision system (Schwartz, 1977). Our 50K node retina-preprocessor enables current DCNN networks to process input images of 930×930px in size, using only consumer grade graphics processor (GPU) hardware, in a single pass of the DCNN and this retina pre-processing approach has the potential to scale to accommodate larger input image sizes This pre-processor mapping confers a degree of scale and rotation invariance to the transformed images facilitating a number of perception tasks, reducing the parameter size and computation required to train a DCNN. This transformation affords a number of additional signal simplifications, including a degree of scale and rotation invariance

Objectives
Results
Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.