Abstract

Instrument segmentation is a crucial and challenging task for robot-assisted surgery operations. Recent commonly-used models extract feature maps in multiple scales and combine them via simple but inferior feature fusion strategies. In this paper, we propose a hierarchical attentional feature fusion scheme, which is efficient and compatible with encoder-decoder architectures. Specifically, to better combine feature maps between adjacent scales, we introduce dense pixel-wise relative attentions learned from the segmentation model; to resolve specific failure modes in predicted masks, we integrate the above attentional feature fusion strategy based on position-channel-aware parallel attention into the decoder. Extensive experimental results evaluated on three datasets from MICCAI 2017 EndoVis Challenge demonstrate that our model outperforms other state-of-the-art counterparts by a large margin.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call