Abstract

Computer vision is currently one of the most exciting and rapidly evolving fields of science, which affects numerous industries. Research and development breakthroughs, mainly in the field of convolutional neural networks (CNNs), opened the way to unprecedented sensitivity and precision in object detection and recognition tasks. Nevertheless, the findings in recent years on the sensitivity of neural networks to additive noise, light conditions, and to the wholeness of the training dataset, indicate that this technology still lacks the robustness needed for the autonomous robotic industry. In an attempt to bring computer vision algorithms closer to the capabilities of a human operator, the mechanisms of the human visual system was analyzed in this work. Recent studies show that the mechanisms behind the recognition process in the human brain include continuous generation of predictions based on prior knowledge of the world. These predictions enable rapid generation of contextual hypotheses that bias the outcome of the recognition process. This mechanism is especially advantageous in situations of uncertainty, when visual input is ambiguous. In addition, the human visual system continuously updates its knowledge about the world based on the gaps between its prediction and the visual feedback. CNNs are feed forward in nature and lack such top-down contextual attenuation mechanisms. As a result, although they process massive amounts of visual information during their operation, the information is not transformed into knowledge that can be used to generate contextual predictions and improve their performance. In this work, an architecture was designed that aims to integrate the concepts behind the top-down prediction and learning processes of the human visual system with the state-of-the-art bottom-up object recognition models, e.g., deep CNNs. The work focuses on two mechanisms of the human visual system: anticipation-driven perception and reinforcement-driven learning. Imitating these top-down mechanisms, together with the state-of-the-art bottom-up feed-forward algorithms, resulted in an accurate, robust, and continuously improving target recognition model.

Highlights

  • While there are many technological gaps that inhibit the development of the autonomous systems field, there is no doubt that a significant factor in this delay is that it has been much harder than expected to give robotic agents the capabilities to analyze their ever-changing environment, detect and classify the objects surrounding them, and interpret the interaction between them

  • Using the same color code, this figure shows the pathways in the Visual Associative Predictive (VAP) model that correspond to those in the human visual system

  • The 17,256 images from the SUN2013 dataset that were not used for training, were classified using both the FRCNN with the VGG16 model and the VAP model

Read more

Summary

Introduction

While there are many technological gaps that inhibit the development of the autonomous systems field, there is no doubt that a significant factor in this delay is that it has been much harder than expected to give robotic agents the capabilities to analyze their ever-changing environment, detect and classify the objects surrounding them, and interpret the interaction between them. The state-of-the-art computer vision algorithms, convolutional neural networks (CNNs), achieving remarkable results in object detection and classification challenges, still are not robust enough for many applications. They are sensitive to ambient light conditions [1] and to additive noise [2] as a result of pockets in their manifold. These algorithms are based on bottom-up object detection and recognition process. They do not include the means to use top-down contextual information for a more holistic process

Results
Discussion
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.