Abstract
We present an investigation into adopting a model of the retino-cortical mapping, found in biological visual systems, to improve the efficiency of image analysis using Deep Convolutional Neural Nets (DCNNs) in the context of robot vision and egocentric perception systems. This work has now enabled DCNNs to process input images approaching one million pixels in size, in real time, using only consumer grade graphics processor (GPU) hardware in a single pass of the DCNN.
Highlights
Deep Learning methods have revolutionised signal and image analysis, and end-to-end approaches to training these networks can achieve the state-of-the-art in vision-based control (Viereck et al, 2017; Morrison et al, 2018) and recognition for robotics
In order to evaluate the performance of the retinal subsampling mechanism and the cortical image representation in isolation, three Deep Convolutional Neural Nets (DCNNs) were trained, each with the same architecture but each using a different subset of the dataset built in the previous section
We have confirmed the utility of the functional architecture of the human visual pathway, as predicted by Schwartz and others, by investigating retino-cortical mapping models within implementations of computer vision systems based on Deep Learning
Summary
Deep Learning methods have revolutionised signal and image analysis, and end-to-end approaches to training these networks can achieve the state-of-the-art in vision-based control (Viereck et al, 2017; Morrison et al, 2018) and recognition for robotics. A real obstacle to the practical adoption of DCNNs is their requirement for very large training data sets and their inability to scale to process image matrices of greater than approximately 300 × 300 px in a single pass We address this issue directly by adopting a computational model of the space-variant, i.e., foveated, visual processing architecture found in the mammalian vision system (Schwartz, 1977). Our 50K node retina-preprocessor enables current DCNN networks to process input images of 930×930px in size, using only consumer grade graphics processor (GPU) hardware, in a single pass of the DCNN and this retina pre-processing approach has the potential to scale to accommodate larger input image sizes This pre-processor mapping confers a degree of scale and rotation invariance to the transformed images facilitating a number of perception tasks, reducing the parameter size and computation required to train a DCNN. This transformation affords a number of additional signal simplifications, including a degree of scale and rotation invariance
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.