Abstract

Multiple-view stereo has potential applications in robotic operations and autonomous driving (unstructured environment construction, visual servo). With assisted depth information, inertial navigation systems can achieve precise navigation. It is, especially suitable for GPS failures in complex environments. Accurate depth estimation is a challenge in low-textured or occluded regions. To alleviate the inference of incorrect depth, a multi-stage pixel-visibility learning-based stereo network is presented in this paper. Its improvements are as follows: 1) a new content-adaptive cost volume aggregation mechanism based on neighboring pixel-wise visibility is designed to effectively produce more accurate and smoother depth map predictions in the object boundary. 2) global convolution block and boundary refinement block are developed to regularize its cost volume, they can learn the inherent constraints of feature matching correspondence and effectively mitigate the depth estimation uncertainty in low-textured regions. 3) a new loss function is designed to measure the uncertainty of predicted probability distribution and enhance the reliability of depth map inference. Experimental results on the indoor DTU datasets and the outdoor Tanks & Temples datasets indicate that our method can achieve superior performance and has a powerful generalization ability, which is comparable to state-of-the-art works. Note to Practitioners—Multiple-view stereo (MVS) can estimate dense 3D representations of scenes, which is widely used in autonomous driving, robotic navigation, virtual reality (VR), and augmented reality (AR). Aiming at the problem of incorrect depth inference in low-textured or occluded regions, this work proposes a novel multi-stage depth prediction method based on neighboring pixel-wise visibility. Our method cannot only achieve accurate depth estimation for robot perception but also make no concession to real-time performance. It is clear that the proposed method has good potential in 3D reconstruction, robotic navigation, and VR/AR fields to provide accurate depth estimation in real-time with limited memory consumption.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call