Abstract

AbstractInstrument-tissue interaction detection in surgical videos is a fundamental problem for surgical scene understanding which is of great significance to computer-assisted surgery. However, few works focus on this fine-grained surgical activity representation. In this paper, we propose to represent instrument-tissue interaction as \(\langle \)instrument bounding box, tissue bounding box, instrument class, tissue class, action class\(\rangle \) quintuples. We present a novel quintuple detection network (QDNet) for the instrument-tissue interaction quintuple detection task in cataract surgery videos. Specifically, a spatiotemporal attention layer (STAL) is proposed to aggregate spatial and temporal information of the regions of interest between adjacent frames. We also propose a graph-based quintuple prediction layer (GQPL) to reason the relationship between instruments and tissues. Our method achieves an \(\textrm{mAP}\) of 42.24% on a cataract surgery video dataset, significantly outperforming other methods.KeywordsInstrument-tissue interaction quintuple detectionSurgical scene understandingSurgery video

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call