Abstract

Depth estimation is a critical problem in robotics applications especially autonomous driving. Currently, depth prediction based on binocular stereo matching and depth completion based on fusion of monocular image and laser point cloud are two mainstream methods. However, the former usually suffers from lack of constraint while building cost volume, and the latter could not be trained in self-supervised way and haven’t utilized the geometric constraint of stereo matching, which we think will further improve the performance. Therefore, we propose a novel multimodal neural network, namely UAMD-Net, for dense depth completion based on fusion of binocular stereo matching and the weak constraint from the sparse point clouds. Specifically, the sparse point clouds are converted to sparse depth map and filled to the <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">multimodal feature encoder</i> (MFE) with binocular image, constructing a cross-modal cost volume. Then, it will be further processed by the <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">multimodal feature aggregator</i> (MFA) and the depth regression layer. Furthermore, since previous multimodal depth estimation methods ignore the problem of modality dependence, we propose a new training strategy called <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">random modality dropout</i> (RMD) which enables the network to be adaptively trained with multiple modality inputs and inference with specific modality inputs. Benefiting from the flexible network structure and adaptive training method, our proposed network can realize unified training under various modality input conditions. Comprehensive experiments conducted on KITTI and DrivingStereo depth completion datasets demonstrate that our method produces robust results and outperforms other state-of-the-art methods.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call