Abstract

Abstract Surgical tool presence detection in laparoscopic videos is a challenging problem that plays a critical role in developing context-aware systems in operating rooms (ORs). In this work, we propose a deep learning-based approach for detecting surgical tools in laparoscopic images using a convolutional neural network (CNN) in combination with two long short-term memory (LSTM) models. A pre-trained CNN model was trained to learn visual features from images. Then, LSTM was employed to include temporal information through a video clip of neighbour frames. Finally, the second LSTM was utilized to model temporal dependencies across the whole surgical video. Experimental evaluation has been conducted with the Cholec80 dataset to validate our approach. Results show that the most notable improvement is achieved after employing the two-stage LSTM model, and the proposed approach achieved better or similar performance compared with state-of-the-art methods.

Highlights

  • Analysing surgical workflow is a key factor in establishing intelligent technologies that aim to support surgical teams and optimize patient treatment inside the operating room (OR)

  • Additional challenges arise due to various reasons: multi-tool classification task, image blur resulting from rapid movement of camera and tools masked with blood or tissues or obscured by smoke from electro-surgical cutting and coagulation

  • The proposed method consists of three main components: convolutional neural network (CNN) model for extracting visual features and two long short-term memory (LSTM) for incorporating temporal information

Read more

Summary

Introduction

Analysing surgical workflow is a key factor in establishing intelligent technologies that aim to support surgical teams and optimize patient treatment inside the operating room (OR). Recognizing surgical workflow is a fundamental component to develop context-aware systems [1]. These systems can effectively monitor the workflow and communicate relevant information to human operators with different perspectives, i.e. surgeon and anaesthesiologist. Wang et al demonstrated the feasibility of considering information along continuous video frames [6]. With their approach, a graph convolutional network (GCN) was applied to capture temporal information from a video clip. Chen et al explored using a 3D convolutional network to extract spatiotemporal features from a video clip to perform tool classification [8]

Objectives
Methods
Discussion
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.