Abstract

In this paper, we address the problem of estimating remaining surgery duration (RSD) from surgical video frames. We propose a Bayesian long short-term memory (LSTM) network-based Deep Negative Correlation Learning approach called BD-Net for accurate regression of RSD prediction as well as estimation of prediction uncertainty. Our method aims to extract discriminative visual features from surgical video frames and model the temporal dependencies among frames to improve the RSD prediction accuracy. To this end, we propose to train an ensemble of Bayesian LSTMs on top of a backbone network by the way of deep negative correlation learning (DNCL). More specifically, we deeply learn a pool of decorrelated Bayesian regressors with sound generalization capabilities through managing their intrinsic diversities. BD-Net is simple and efficient. After training, it can produce both RSD prediction and uncertainty estimation in a single inference run. We demonstrate the efficacy of BD-Net on publicly available datasets of two different types of surgeries: one containing 101 cataract microscopic surgeries with short durations and the other containing 80 cholecystectomy laparoscopic surgeries with relatively longer durations. Experimental results on both datasets demonstrate that the proposed BD-Net achieves better results than the state-of-the-art (SOTA) methods. A reference implementation of our method can be found at: https://github.com/jywu511/BD-Net.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call