Advancing surgical VQA with scene graph knowledge

Kun Yuan,Kun Yuan,Kun Yuan,Kun Yuan,Kun Yuan,Kun Yuan,Kun Yuan,Kun Yuan,Kun Yuan,Manasi Kattel,Manasi Kattel,Manasi Kattel,Manasi Kattel,Joël L Lavanchy,Nassir Navab,Vinkle Srivastav,Vinkle Srivastav,Vinkle Srivastav,Vinkle Srivastav,Nicolas Padoy,Nicolas Padoy,Nicolas Padoy,Nicolas Padoy

doi:10.1007/s11548-024-03141-y

Kun Yuan, Kun Yuan + Show 21 more

Open Access

https://doi.org/10.1007/s11548-024-03141-y

Copy DOI

Abstract

PurposeThe modern operating room is becoming increasingly complex, requiring innovative intra-operative support systems. While the focus of surgical data science has largely been on video analysis, integrating surgical computer vision with natural language capabilities is emerging as a necessity. Our work aims to advance visual question answering (VQA) in the surgical context with scene graph knowledge, addressing two main challenges in the current surgical VQA systems: removing question–condition bias in the surgical VQA dataset and incorporating scene-aware reasoning in the surgical VQA model design.MethodsFirst, we propose a surgical scene graph-based dataset, SSG-VQA, generated by employing segmentation and detection models on publicly available datasets. We build surgical scene graphs using spatial and action information of instruments and anatomies. These graphs are fed into a question engine, generating diverse QA pairs. We then propose SSG-VQA-Net, a novel surgical VQA model incorporating a lightweight Scene-embedded Interaction Module, which integrates geometric scene knowledge in the VQA model design by employing cross-attention between the textual and the scene features.ResultsOur comprehensive analysis shows that our SSG-VQA dataset provides a more complex, diverse, geometrically grounded, unbiased and surgical action-oriented dataset compared to existing surgical VQA datasets and SSG-VQA-Net outperforms existing methods across different question types and complexities. We highlight that the primary limitation in the current surgical VQA systems is the lack of scene knowledge to answer complex queries.ConclusionWe present a novel surgical VQA dataset and model and show that results can be significantly improved by incorporating geometric scene features in the VQA model design. We point out that the bottleneck of the current surgical visual question–answer model lies in learning the encoded representation rather than decoding the sequence. Our SSG-VQA dataset provides a diagnostic benchmark to test the scene understanding and reasoning capabilities of the model. The source code and the dataset will be made publicly available at: https://github.com/CAMMA-public/SSG-VQA.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: International Journal of Computer Assisted Radiology and Surgery	Publication Date: May 23, 2024
Citations: 1	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

Advancing surgical VQA with scene graph knowledge

Abstract

Talk to us

Similar Papers

More From: International Journal of Computer Assisted Radiology and Surgery

Lead the way for us

Similar Papers

Knowledge-Routed Visual Question Reasoning: Challenges for Deep Representation Embedding.
Qingxing Cao ... Liang Lin
IEEE Transactions on Neural Networks and Learning Systems | VOL. 33
Qingxing Cao, et. al.Qingxing Cao ... Liang Lin
01 Jan 2020
IEEE Transactions on Neural Networks and Learning Systems | VOL. 33

Adversarial Sample Synthesis for Visual Question Answering
Chuanhao Li ... Yuwei Wu
ACM Transactions on Multimedia Computing, Communications, and Applications | VOL. -
Chuanhao Li, et. al.Chuanhao Li ... Yuwei Wu
16 Sep 2024
ACM Transactions on Multimedia Computing, Communications, and Applications | VOL. -

Can Pre-training help VQA with Lexical Variations?
Shailza Jolly ... Shubham Kapoor
-
Shailza Jolly, et. al.Shailza Jolly ... Shubham Kapoor
01 Jan 2020
01 Jan 2020

Don't Just Assume; Look and Answer: Overcoming Priors for Visual Question Answering
Aishwarya Agrawal ... Dhruv Batra
-
Aishwarya Agrawal, et. al.Aishwarya Agrawal ... Dhruv Batra
01 Jun 2018
01 Jun 2018

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Advancing surgical VQA with scene graph knowledge

Abstract

Talk to us

Similar Papers

More From: International Journal of Computer Assisted Radiology and Surgery