Blood vessel and surgical instrument segmentation is a fundamental technique for robot-assisted surgical navigation. Despite the significant progress in natural image segmentation, surgical image-based vessel and instrument segmentation are rarely studied. In this work, we propose a novel self-supervised pretraining method (SurgNet) that can effectively learn representative vessel and instrument features from unlabeled surgical images. As a result, it allows for precise and efficient segmentation of vessels and instruments with only a small amount of labeled data. Specifically, we first construct a region adjacency graph (RAG) based on local semantic consistency in unlabeled surgical images and use it as a self-supervision signal for pseudo-mask segmentation. We then use the pseudo-mask to perform guided masked image modeling (GMIM) to learn representations that integrate structural information of intraoperative objectives more effectively. Our pretrained model, paired with various segmentation methods, can be applied to perform vessel and instrument segmentation accurately using limited labeled data for fine-tuning. We build an Intraoperative Vessel and Instrument Segmentation (IVIS) dataset, comprised of ~3 million unlabeled images and over 4,000 labeled images with manual vessel and instrument annotations to evaluate the effectiveness of our self-supervised pretraining method. We also evaluated the generalizability of our method to similar tasks using two public datasets. The results demonstrate that our approach outperforms the current state-of-the-art (SOTA) self-supervised representation learning methods in various surgical image segmentation tasks.
Read full abstract