Abstract

Estimating depth from monocular images is a powerful method to perceive valuable environmental information, which is essential for applications that require three-dimensional (3D) environmental models such as autonomous driving and virtual reality. The monocular self-supervised depth estimation method based on deep learning has made rapid progress without depth ground truth information. However, the existing methods are based on the assumption of a static world during training, and depth estimation in a dynamic environment needs further development. To solve this problem, we propose a new monocular self-supervised depth estimation method in dynamic scenes to eliminate the negative impact of moving objects in the image sequence when calculating the self-supervised loss. Specifically, for the self-supervised depth estimation framework, we propose a moving object mask based on the minimum instance photometric residual and then combine it with the mask based on instance re-projection residual in the existing instance-level moving object segmentation methods. In addition, we design a moving instance loss function to process the moving object, so that the training of the model can achieve better performance. Experiments are conducted on public datasets to verify the effectiveness of the proposed method and each of its components, and the results show that our method achieves better performance for depth estimation in dynamic scenes compared to state-of-the-art methods.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call