TMA-Net: A Transformer-Based Multi-Scale Attention Network for Surgical Instrument Segmentation

Lei Yang,Hongnian Yu,Hongyong Wang,Guibin Bian,Yanhong Liu,Yuge Gu

doi:10.1109/tmrb.2023.3269856

Abstract

The ability to accurately and automatically segment surgical instruments is one of the important prerequisites for reasonable and stable operation of surgical robots. The utilization of deep learning in medical image segmentation has gained widespread recognition in recent years, leading to the proposition of multiple network models designed for the segmentation of diverse medical images, among which the most effective one is U-Net and its variants. Nevertheless, these existing networks also have various drawbacks, such as limited contextual representation capability, insufficient local feature processing, etc. In order to solve the above problems so that more accurate surgical instrument segmentation performance can be obtained, a transformer-based multi-scale attention network is proposed, referred to as TMA-Net, for surgical instrument segmentation from endoscopic images to serve robot-assisted surgery. To enable more accurate extraction of image features, a dual-branch encoder structure is proposed to obtain stronger contexts. Further, to address the problem that the simple skip connection is insufficient for local feature processing, an attention feature fusion (AFF) module and an additive attention and concatenation (AAC) module are proposed for effective feature learning to filter out the irrelevant information in the low-level features. Furthermore, a multi-scale context fusion (MCF) block is introduced to enhance the local feature maps and capture multi-scale contextual information. The efficacy of proposed TMA-Net is demonstrated through experimentation on publicly available surgical instrument segmentation datasets, including Endovis2017 and UW-Sinus-Surgery-C/L. The results show that proposed TMA-Net outperforms existing methods in terms of surgical instrument segmentation accuracy.

Full Text