Learning laparoscopic video shot classification for gynecological surgery

Stefan Petscharnig,Klaus Schöffmann

doi:10.1007/s11042-017-4699-5

Stefan Petscharnig, Klaus Schöffmann

Open Access

https://doi.org/10.1007/s11042-017-4699-5

Copy DOI

Journal: Multimedia Tools and Applications	Publication Date: Apr 22, 2017
Citations: 58	License type: open-access

Affiliation: University of Klagenfurt

Abstract

Videos of endoscopic surgery are used for education of medical experts, analysis in medical research, and documentation for everyday clinical life. Hand-crafted image descriptors lack the capabilities of a semantic classification of surgical actions and video shots of anatomical structures. In this work, we investigate how well single-frame convolutional neural networks (CNN) for semantic shot classification in gynecologic surgery work. Together with medical experts, we manually annotate hours of raw endoscopic gynecologic surgery videos showing endometriosis treatment and myoma resection of over 100 patients. The cleaned ground truth dataset comprises 9 h of annotated video material (from 111 different recordings). We use the well-known CNN architectures AlexNet and GoogLeNet and train these architectures for both, surgical actions and anatomy, from scratch. Furthermore, we extract high-level features from AlexNet with weights from a pre-trained model from the Caffe model zoo and feed them to an SVM classifier. Our evaluation shows that we reach an average recall of .697 and .515 for classification of anatomical structures and surgical actions respectively using off-the-shelf CNN features. Using GoogLeNet, we achieve a mean recall of .782 and .617 for classification of anatomical structures and surgical actions respectively. With AlexNet the achieved recall is .615 for anatomical structures and .469 for surgical action classification respectively. The main conclusion of our work is that advances in general image classification methods transfer to the domain of endoscopic surgery videos in gynecology. This is relevant as this domain is different from natural images, e.g. it is distinguished by smoke, reflections, or a limited amount of colors.

Highlights

IntroductionEndoscopic surgery procedures as well as imaging technology have advanced rapidly
In recent years, endoscopic surgery procedures as well as imaging technology have advanced rapidly
We use the trained models of AlexNet and GoogLeNet architectures for action and anatomy classification as well as SVM classifiers trained on high-level convolutional neural networks (CNN) feature vectors fc6, fc7, and class from the AlexNet architecture

Summary

Introduction

Endoscopic surgery procedures as well as imaging technology have advanced rapidly These advances enable physicians to perform minimally invasive surgeries. As a side-effect, the recoded surgery videos benefit the surgeons’ work, as they provide a great basis for documentation, training of young surgeons, and medical research. Prior work supporting these aims has been conducted by our research group in the sector of endoscopic video analysis, such as a subjective quality assessment for the impact of compression on the perceived semantic quality [13], instrument classification in laparoscopic videos [17], or extraction and linking of endoscopic key-frames to videos [3, 23]. The importance of deep learning in medical image analysis and content-based processing and analysis of endoscopic images and video is apparent from the work of Litjens et al [9] and Muenzer et al [12] respectively

Objectives

Results

Conclusion