Abstract

Surgical interaction recognition (SIR) plays a crucial role in navigation decision support for minimally invasive surgery (MIS) or robot-assisted MIS. Currently, the research in SIR is at a coarse-grained level and barely considers the surgical interaction dependencies unrelated to endoscopic images. This work proposes a fine-grained SIR method named SIRNet aiming at predicting surgical interaction triplets. In the proposed SIRNet, a multi-head self-attention mechanism learns the relations among surgical interaction triplets without defining them before the training process, while a multi-head cross-attention mechanism learns the relationships between the endoscopic images and each triplet. The bipartite matching loss, which considers the permutation and combination of instruments, verbs, and targets, is adopted to make appropriate learning and prediction for each component in the surgical interaction triplet. Moreover, a weight attention module is designed to weigh the importance of each predicted surgical interaction triplet and each component in triplet when predicting final valid surgical interaction triplets. The experimental results show the proposed method improves the performance of fine-grained SIR. In addition, experiments also present the effectiveness of each module. The code is available at <uri xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">https://github.com/cynerelee/SIRNet</uri> <uri xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">https://github.com/cynerelee/SIRNet</uri> .

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call