Abstract

Depth completion is an important task in computer vision and robotics applications, which aims at predicting accurate dense depth from a single RGB-LiDAR image. Convolutional neural networks (CNNs) have been widely used for depth completion to learn a mapping function from sparse to dense depth. However, recent methods do not exploit any 3D geometric cues during the inference stage and mainly rely on sophisticated CNN architectures. In this paper, we present a cascade and geometrically inspired learning framework for depth completion, consisting of three stages: view extrapolation, stereo matching, and depth refinement. The first stage extrapolates a virtual (right) view using a single RGB (left) and its LiDAR data. We then mimic the binocular stereo-matching, and as a result, explicitly encode geometric constraints during depth completion. This stage augments the final refinement process by providing additional geometric reasoning. We also introduce a distillation framework based on teacher-student strategy to effectively train our network. Knowledge from a teacher model privileged with real stereo pairs is transferred to the student through feature distillation. Experimental results on KITTI depth completion benchmark demonstrate that the proposed method is superior to state-of-the-art methods.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call