Automatic Detection of Out-of-Body Frames in Surgical Videos for Privacy Protection Using Self-Supervised Learning and Minimal Labels

Ziheng Wang,Anthony Jarc,Xi Liu,Conor Perreault

doi:10.1142/s2424905x23500022

Abstract

Endoscopic video recordings are widely used in minimally invasive robot-assisted surgery, but when the endoscope is outside the patient’s body, it can capture irrelevant segments that may contain sensitive information. To address this, we propose a framework that accurately detects out-of-body frames in surgical videos by leveraging self-supervision with minimal data labels. We use a massive amount of unlabeled endoscopic images to learn meaningful representations in a self-supervised manner. Our approach, which involves pre-training on an auxiliary task and fine-tuning with limited supervision, outperforms previous methods for detecting out-of-body frames in surgical videos captured from da Vinci X and Xi surgical systems. The average F1 scores range from [Formula: see text] to [Formula: see text]. Remarkably, using only [Formula: see text] of the training labels, our approach still maintains an average F1 score performance above 97, outperforming fully-supervised methods with [Formula: see text] fewer labels. These results demonstrate the potential of our framework to facilitate the safe handling of surgical video recordings and enhance data privacy protection in minimally invasive surgery.

Full Text