A Learned Stereo Depth System for Robotic Manipulation in Homes

Krishna Shankar,Mark Tjersland,Kevin Stone,Max Bajracharya,Jeremy Ma

doi:10.1109/lra.2022.3143895

Krishna Shankar, Mark Tjersland + Show 3 more

Open Access

https://doi.org/10.1109/lra.2022.3143895

Copy DOI

Abstract

We present a passive stereo depth system that produces dense and accurate point clouds optimized for human environments, including dark, textureless, thin, reflective and specular surfaces and objects, at 2560 × 2048 resolution, with 384 disparities, in 30 ms. The system consists of an algorithm combining learned stereo matching with engineered filtering, a training and data-mixing methodology, and a sensor hardware design. Our architecture is 15× faster than approaches that perform similarly on the Middlebury and Flying Things Stereo Benchmarks. To effectively supervise the training of this model, we combine real data labelled using off-the-shelf depth sensors, as well as a number of different rendered, simulated labeled datasets. We demonstrate the efficacy of our system by presenting a large number of qualitative results in the form of depth maps and point-clouds, experiments validating the metric accuracy of our system and comparisons to other sensors on challenging objects and scenes. We also show the competitiveness of our algorithm compared to state-of-the-art learned models using the Middlebury and FlyingThings datasets.

Full Text