Cascade Multi-Level Transformer Network for Surgical Workflow Analysis.

Wenxi Yue,Vincent Lam,Yong Xia,Hongen Liao,Jiebo Luo,Zhiyong Wang

doi:10.1109/tmi.2023.3265354

Abstract

Surgical workflow analysis aims to recognise surgical phases from untrimmed surgical videos. It is an integral component for enabling context-aware computer-aided surgical operating systems. Many deep learning-based methods have been developed for this task. However, most existing works aggregate homogeneous temporal context for all frames at a single level and neglect the fact that each frame has its specific need for information at multiple levels for accurate phase prediction. To fill this gap, in this paper we propose Cascade Multi-Level Transformer Network (CMTNet) composed of cascaded Adaptive Multi-Level Context Aggregation (AMCA) modules. Each AMCA module first extracts temporal context at the frame level and the phase level and then fuses frame-specific spatial feature, frame-level temporal context, and phase-level temporal context for each frame adaptively. By cascading multiple AMCA modules, CMTNet is able to gradually enrich the representation of each frame with the multi-level semantics that it specifically requires, achieving better phase prediction in a frame-adaptive manner. In addition, we propose a novel refinement loss for CMTNet, which explicitly guides each AMCA module to focus on extracting the key context for refining the prediction of the previous stage in terms of both prediction confidence and smoothness. This further enhances the quality of the extracted context effectively. Extensive experiments on the Cholec80 and the M2CAI datasets demonstrate that CMTNet achieves state-of-the-art performance.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Cascade Multi-Level Transformer Network for Surgical Workflow Analysis.

Abstract

Talk to us

Similar Papers

More From: IEEE Transactions on Medical Imaging

Lead the way for us

Journal: IEEE Transactions on Medical Imaging	Publication Date: Oct 1, 2023
Citations: 2

Similar Papers

Multi-Scale 2D Temporal Adjacency Networks for Moment Localization With Natural Language.
Songyang Zhang ... Houwen Peng
IEEE Transactions on Pattern Analysis and Machine Intelligence | VOL. 44
Songyang Zhang, et. al.Songyang Zhang ... Houwen Peng
01 Dec 2022
IEEE Transactions on Pattern Analysis and Machine Intelligence | VOL. 44

PeMNet for Pectoral Muscle Segmentation.
Xiang Yu ... David S Guttery
Biology | VOL. 11
Xiang Yu, et. al.Xiang Yu ... David S Guttery
14 Jan 2022
Biology | VOL. 11

SDN: Semantic Decoupling Network for Temporal Language Grounding.
Xun Jiang ... Zuo Cao
IEEE Transactions on Neural Networks and Learning Systems | VOL. 35
Xun Jiang, et. al.Xun Jiang ... Zuo Cao
01 May 2024
IEEE Transactions on Neural Networks and Learning Systems | VOL. 35

Learned Video Compression with Efficient Temporal Context Learning.
Dengchao Jin ... Bo Peng
IEEE Transactions on Image Processing | VOL. PP
Dengchao Jin, et. al.Dengchao Jin ... Bo Peng
01 Jan 2023
IEEE Transactions on Image Processing | VOL. PP

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Cascade Multi-Level Transformer Network for Surgical Workflow Analysis.

Abstract

Talk to us

Similar Papers

More From: IEEE Transactions on Medical Imaging