Geometric Constraints for Self-supervised Monocular Depth Estimation on Laparoscopic Images with Dual-task Consistency

Wenda Li,Kazunari Misawa,Masahiro Oda,Yuichiro Hayashi,Kensaku Mori,Takayuki Kitasaka

doi:10.1007/978-3-031-16440-8_45

Abstract

AbstractDepth values are essential information to automate surgical robots and achieve Augmented Reality technology for minimally invasive surgery. Although depth-pose self-supervised monocular depth estimation performs impressively for autonomous driving scenarios, it is more challenging to predict accurate depth values for laparoscopic images due to the following two aspects: (i) the laparoscope’s motions contain many rotations, leading to pose estimation difficulties for the depth-pose learning strategy; (ii) the smooth surface reduces photometric error even if the matching pixels are inaccurate between adjacent frames. This paper proposes a novel self-supervised monocular depth estimation for laparoscopic images with geometric constraints. We predict the scene coordinates as an auxiliary task and construct dual-task consistency between the predicted depth maps and scene coordinates under a unified camera coordinate system to achieve pixel-level geometric constraints. We extend the pose estimation into a Siamese process to provide stronger and more balanced geometric constraints in a depth-pose learning strategy by leveraging the order of the adjacent frames in a video sequence. We also design a weight mask for depth estimation based on our consistency to alleviate the interference from predictions with low confidence. The experimental results showed that the proposed method outperformed the baseline on depth and pose estimation. Our code is available at https://github.com/MoriLabNU/GCDepthL.KeywordsMonocular depth estimationSelf-supervised learningLaparoscopic images

Full Text