Abstract
Automatically recording surgical procedures and generating surgical reports are crucial for alleviating surgeons' workload and enabling them to concentrate more on the operations. Despite some achievements, there still exist several issues for the previous works: 1) failure to model the interactive relationship between surgical instruments and tissue; and 2) neglect of fine-grained differences within different surgical images in the same surgery. To address these two issues, we propose an improved scene graph-guided Transformer, also named by SGT++, to generate more accurate surgical report, in which the complex interactions between surgical instruments and tissue are learnt from both explicit and implicit perspectives. Specifically, to facilitate the understanding of the surgical scene graph under a graph learning framework, a simple yet effective approach is proposed for homogenizing the input heterogeneous scene graph. For the homogeneous scene graph that contains explicit structured and fine-grained semantic relationships, we design an attention-induced graph transformer for node aggregation via an explicit relation-aware encoder. In addition, to characterize the implicit relationships about the instrument, tissue, and the interaction between them, the implicit relational attention is proposed to take full advantage of the prior knowledge from the interactional prototype memory. With the learnt explicit and implicit relation-aware representations, they are then coalesced to obtain the fused relation-aware representations contributing to generating reports. Some comprehensive experiments on two surgical datasets show that the proposed STG++ model achieves state-of-the-art results.
Published Version
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.