Abstract

PurposeAutomatic segmentation and classification of surgical activity is crucial for providing advanced support in computer-assisted interventions and autonomous functionalities in robot-assisted surgeries. Prior works have focused on recognizing either coarse activities, such as phases, or fine-grained activities, such as gestures. This work aims at jointly recognizing two complementary levels of granularity directly from videos, namely phases and steps.MethodsWe introduce two correlated surgical activities, phases and steps, for the laparoscopic gastric bypass procedure. We propose a multi-task multi-stage temporal convolutional network (MTMS-TCN) along with a multi-task convolutional neural network (CNN) training setup to jointly predict the phases and steps and benefit from their complementarity to better evaluate the execution of the procedure. We evaluate the proposed method on a large video dataset consisting of 40 surgical procedures (Bypass40).ResultsWe present experimental results from several baseline models for both phase and step recognition on the Bypass40. The proposed MTMS-TCN method outperforms single-task methods in both phase and step recognition by 1-2% in accuracy, precision and recall. Furthermore, for step recognition, MTMS-TCN achieves a superior performance of 3-6% compared to LSTM-based models on all metrics.ConclusionIn this work, we present a multi-task multi-stage temporal convolutional network for surgical activity recognition, which shows improved results compared to single-task models on a gastric bypass dataset with multi-level annotations. The proposed method shows that the joint modeling of phases and steps is beneficial to improve the overall recognition of each type of activity.

Highlights

  • Recent works in computer-assisted interventions and robotassisted minimally invasive surgery have seen significant progress in developing advanced support technologies for the demanding scenarios of a modern operating room (OR) [6,21,27]

  • Similar to [17], we introduce a hierarchical representation of Laparoscopic Roux-En-Y gastric bypass (LRYGB) procedure containing phases and steps representing the workflow performed in our hospital and focus our attention on the recognition of these two types of activities

  • To jointly learn the tasks of phase and step recognition, we introduce MTMS-Temporal convolutional networks (TCNs), a multitask multi-stage temporal convolutional networks, extending MS-TCNs [9] proposed for action segmentation

Read more

Summary

Introduction

Recent works in computer-assisted interventions and robotassisted minimally invasive surgery have seen significant progress in developing advanced support technologies for the demanding scenarios of a modern operating room (OR) [6,21,27]. The visual detection of phases [7,15,16,25,30], robotic gestures [2,10,26,29], and instruments [11,14,16,22] has, for instance, seen a surge in research activities, due to their potential impact on developing intra- and postoperative tools for the purposes of monitoring safety, assessing skills, and reporting. Many of these previous works have focused on endoscopic cholecystectomy procedures, utilizing the publicly available large-scale Cholec dataset [25], and on cataract surgical procedures, utilizing the popular CATARACTS dataset [11,30]

Objectives
Methods
Results
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call