Endoscopy holds a pivotal role in the early detection and treatment of diverse diseases, with artificial intelligence (AI)-assisted methods increasingly gaining prominence in disease screening. Among them, the depth estimation from endoscopic sequences is crucial for a spectrum of AI-assisted surgical techniques. However, the development of endoscopic depth estimation algorithms presents a formidable challenge due to the unique environmental intricacies and constraints within the dataset. This paper proposes a self-supervised depth estimation network to comprehensively explore the brightness changes in endoscopic images, and fuse different features at multiple levels to achieve an accurate prediction of endoscopic depth. First, a FlowNet is designed to evaluate the brightness changes of adjacent frames by calculating the multi-scale structural similarity. Second, a feature fusion module is presented to capture multi-scale contextual information. Experiments show that the average accuracy of the algorithm is 97.03% in the Stereo Correspondence and Reconstruction of Endoscopic Data (SCARED dataset). Based on the training parameters of the SCARED dataset, the algorithm achieves superior performance on the other two datasets (EndoSLAM and KVASIR dataset), indicating that the algorithm has good generalization performance.
Read full abstract