Abstract

Depth perception is considered an invaluable source of information for various vision tasks. However, depth maps acquired using consumer-level sensors still suffer from non-negligible noise. This fact has recently motivated researchers to exploit traditional filters, as well as the deep learning paradigm, in order to suppress the aforementioned non-uniform noise, while preserving geometric details. Despite the effort, deep depth denoising is still an open challenge mainly due to the lack of clean data that could be used as ground truth. In this paper, we propose a fully convolutional deep autoencoder that learns to denoise depth maps, surpassing the lack of ground truth data. Specifically, the proposed autoencoder exploits multiple views of the same scene from different points of view in order to learn to suppress noise in a self-supervised end-to-end manner using depth and color information during training, yet only depth during inference. To enforce self-supervision, we leverage a differentiable rendering technique to exploit photometric supervision, which is further regularized using geometric and surface priors. As the proposed approach relies on raw data acquisition, a large RGB-D corpus is collected using Intel RealSense sensors. Complementary to a quantitative evaluation, we demonstrate the effectiveness of the proposed self-supervised denoising approach on established 3D reconstruction applications. Code is avalable at https://github.com/VCL3D/DeepDepthDenoising

Highlights

  • Depth sensing serves as an important information cue for all vision related tasks

  • A new RGB-D corpus has been collected, using multiple D415 devices containing more than 10K quadruples of RGB-D frames

  • For the sake of spatiotemporal alignment between the color and the depth streams of the sensor, the infrared RGB stream was used instead of the extra RGB only camera. This ensures the alignment of the color and depth image domains, and circumvents a technical limitation of the sensors that does not offer precise HW synchronization between the stereo pair and the RGB camera

Read more

Summary

Introduction

Upon the advent of consumer grade depth sensors, the research community has exploited the availability of depth information to make performance leaps in a variety of domains These include SLAM technology for robotics navigation, static scene capture or track-. Depth sensors can be categorized based on either their interaction with the observed scene in passive (pure observation) and active (observation after actuation), or their technological basis in stereo, structured light (SL) and timeof-flight (ToF) respectively. While the latter two are active by definition, stereo-based sensors can operate in both passive and active mode as they estimate depth via binocular observation and triangulation. The aforementioned sensor types suffer from high levels of noise and structural artifacts

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call