Abstract

Deep learning is the mainstream paradigm in computer vision and machine learning, but performance is usually not as good as expected when used for applications in robot vision. The problem is that robot sensing is inherently active, and often, relevant data is scarce for many application domains. This calls for novel deep learning approaches that can offer a good performance at a lower data consumption cost. We address here monocular depth estimation in warehouse automation with new methods and three different deep architectures. Our results suggest that the incorporation of sensor models and prior knowledge relative to robotic active vision, can consistently improve the results and learning performance from fewer than usual training samples, as compared to standard data-driven deep learning.

Highlights

  • In the last years, deep learning has become the mainstream paradigm in computer vision and machine learning [1]

  • We propose here three deep network architectures in such a way that information from a modelled and parameterized sensor is considered sequentially in their design, namely the estimation of image displacements based on the camera model; the optical flow estimation based on the correlation of two consecutive images and the subsequent correlation with the change in camera position; the estimation of the camera displacement from a depth image

  • To combine two different neural networks effectively, we introduce a brightness error and a subconvolutional neural network diverging from the mid of depthS, which computes a relative pose

Read more

Summary

Introduction

Deep learning has become the mainstream paradigm in computer vision and machine learning [1] Following this trend, more and more approaches using deep learning have been proposed to address different problems in sensing for robotics. Robots pose a number of challenges for this methodology, that relate to the fact that robot sensing is inherently active [2]. This active nature offers opportunities that have been exploited for years in the context of active vision [3]; for instance, more information can be extracted from sensory signals by incorporating knowledge about the regular relationship between them and concurrent motor actions [4]. Data-intensive computer vision relies primarily on enormous amounts of decontextualized images

Objectives
Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call