Abstract

Neural network-based solutions have revolutionized the field of computer vision by achieving outstanding performance in a number of applications. Yet, while these deep learning models outclass previous methods, they still have significant shortcomings relating to generalization and robustness to input disturbances, such as occlusion. Most existing methods that tackle this latter problem use passive neural network architectures that are unable to act on and, thus, influence the observed scene. In this paper, we argue that an active observer agent may be able to achieve superior performance by changing the parameters of the scene, thus, avoiding occlusion by moving to a different position in the scene. To demonstrate this, a reinforcement learning environment is introduced that implements OpenAI Gym’s interface, and allows the creation of synthetic scenes with realistic occlusion. The environment is implemented using differentiable rendering, allowing us to perform direct gradient-based optimization of the camera position. Moreover, two additional methods are also presented, one utilizing self-supervised learning to predict occlusion segments, and optimal camera positions, while the other learns to avoid occlusion using Reinforcement Learning. We present comparative experiments of the proposed methods to demonstrate their efficiency. It was shown, via Bayesian t-tests, that the neural network-based methods credibly outperformed the gradient-based avoidance strategy by avoiding occlusion with an average of 5.0 fewer steps in multi-object scenes.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call