Domain gap in adapting self-supervised depth estimation methods for stereo-endoscopy

Lalith Sharan,Matthias Karck,Sandy Engelhardt,Georgii Kostiuchik,Lukas Burger,Ivo Wolf,Raffaele De Simone

doi:10.1515/cdbme-2020-0004

Lalith Sharan, Matthias Karck + Show 5 more

Open Access

PDF Available

https://doi.org/10.1515/cdbme-2020-0004

Copy DOI

Export

Save

Cite

Abstract
Highlights/Summary
Full-Text PDF
Similar Papers

Abstract

Listen

Abstract In endoscopy, depth estimation is a task that potentially helps in quantifying visual information for better scene understanding. A plethora of depth estimation algorithms have been proposed in the computer vision community. The endoscopic domain however, differs from the typical depth estimation scenario due to differences in the setup and nature of the scene. Furthermore, it is unfeasible to obtain ground truth depth information owing to an unsuitable detection range of off-the-shelf depth sensors and difficulties in setting up a depth-sensor in a surgical environment. In this paper, an existing self-supervised approach, called Monodepth [1], from the field of autonomous driving is applied to a novel dataset of stereo-endoscopic images from reconstructive mitral valve surgery. While it is already known that endoscopic scenes are more challenging than outdoor driving scenes, the paper performs experiments to quantify the comparison, and describe the domain gap and challenges involved in the transfer of these methods.

Highlights

The task of depth estimation is a commonly encountered problem in computer vision
Sparse depth estimation methods focus on identifying matching feature points, or matching image patches [4]
Endoscopic scenes in the case of mitral valve repair are prone to specularities, reflection and occlusion artefacts

Summary

Introduction

The task of depth estimation is a commonly encountered problem in computer vision. Beyond prevalent applications in the field of autonomous driving and robotic navigation, depth estimation finds use for endoscopy in aDepth estimation has been tackled through various approaches in the literature. The datasets comprise of depth information acquired by depth sensors such as infrared or LiDAR cameras [5] as the ground truth to supervise the learning. In the case where the ground truth information is not available, the supervision comes from motion or binocular parallax, in other words additional information from the temporal or spatial domain [6]. The acquisition of ground truth depth information is unfeasible due to logistical and safety considerations. Endoscopic scenes in the case of mitral valve repair are prone to specularities, reflection and occlusion artefacts. Occlusions occur due to tissue or instruments partially obstructing the endoscopic field of view, and may persist for a major part of the surgery. The paper examines how existing self-supervised depth estimation approaches address this domain gap in endoscopy, in particular for mitral valve repair

Methods

Results

Conclusion