Self-supervised monocular depth estimation in gastroendoscopy using GAN-augmented images

Aji Resindra Widya,Masatoshi Okutomi,Sho Suzuki,Yusuke Monno,Takuji Gotoda,Kenji Miki,Ivana Išgum,Bennett A Landman

doi:10.1117/12.2579317

Abstract

Gastroendoscopy is the golden standard procedure that enables medical doctors to investigate the inside of a patient's stomach. Monocular depth estimation from an endoscopic image enables the simultaneous acquisition of RGB and depth data, which can boost the capability of the endoscopy for various potential diagnostic applications, such as the RGB-D data acquisition toward whole stomach 3D reconstruction for lesion localization and local view expansion for lesion inspection. Therefore, deep-learning-based approaches are gaining traction to provide depth information in monocular endoscopy. Since it is very difficult to obtain ground-truth RGB and depth image pairs in clinical settings, computer-generated (CG) data is usually used for training the depth estimation network. However, CG data has a limitation to generate realistic RGB and depth data. In this paper, we propose a novel data generation strategy for self-supervised training to predict the depth in gastroendoscopy. To obtain dense reference depth data for training, we first reconstruct a whole stomach 3D model by exploiting chromoendoscopic images sprayed with indigo carmine (IC) blue dye. We then generate virtual no-IC images from chromoendoscopic images using CycleGAN to make our depth estimation network applicable to general endoscopic images without IC dye. We experimentally demonstrate that our proposed approach achieves plausible depth prediction on both chromoendoscopic and general white-light endoscopic images.

Full Text