SDA-CLIP: surgical visual domain adaptation using video and text labels.

Yuchong Li,Shuangfu Jia,Guangbi Song,Ping Wang,Fucang Jia

doi:10.21037/qims-23-376

Abstract

Surgical action recognition is an essential technology in context-aware-based autonomous surgery, whereas the accuracy is limited by clinical dataset scale. Leveraging surgical videos from virtual reality (VR) simulations to research algorithms for the clinical domain application, also known as domain adaptation, can effectively reduce the cost of data acquisition and annotation, and protect patient privacy. We introduced a surgical domain adaptation method based on the contrastive language-image pretraining model (SDA-CLIP) to recognize cross-domain surgical action. Specifically, we utilized the Vision Transformer (ViT) and Transformer to extract video and text embeddings, respectively. Text embedding was developed as a bridge between VR and clinical domains. Inter- and intra-modality loss functions were employed to enhance the consistency of embeddings of the same class. Further, we evaluated our method on the MICCAI 2020 EndoVis Challenge SurgVisDom dataset. Our SDA-CLIP achieved a weighted F1-score of 65.9% (+18.9%) on the hard domain adaptation task (trained only with VR data) and 84.4% (+4.4%) on the soft domain adaptation task (trained with VR and clinical-like data), which outperformed the first place team of the challenge by a significant margin. The proposed SDA-CLIP model can effectively extract video scene information and textual semantic information, which greatly improves the performance of cross-domain surgical action recognition. The code is available at https://github.com/Lycus99/SDA-CLIP.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Quantitative imaging in medicine and surgery	Publication Date: Oct 1, 2023
Citations: 1	License type: cc-by-nc-nd

R Discovery Prime

R Discovery Prime

SDA-CLIP: surgical visual domain adaptation using video and text labels.

Abstract

Talk to us

Similar Papers

More From: Quantitative imaging in medicine and surgery

Lead the way for us

Similar Papers

A web-based virtual reality environment for medical visualization
Ziga Kokelj ... Ciril Bohak
-
Ziga Kokelj, et. al.Ziga Kokelj ... Ciril Bohak
01 May 2018
01 May 2018

Exploring Visualisations for Financial Statements in Virtual Reality
Tanja Kojic ... Sandra Ashipala
-
Tanja Kojic, et. al.Tanja Kojic ... Sandra Ashipala
01 Dec 2020
01 Dec 2020

Extended-Reality Technologies: An Overview of Emerging Applications in Medical Education and Clinical Care.
Wilfredo López-Ojeda ... Robin A Hurley
The Journal of neuropsychiatry and clinical neurosciences | VOL. 33
Wilfredo López-Ojeda, et. al.Wilfredo López-Ojeda ... Robin A Hurley
01 Jul 2021
The Journal of neuropsychiatry and clinical neurosciences | VOL. 33

A Systematic Review of Physiological Measurements, Factors, Methods, and Applications in Virtual Reality
Andreas Halbig ... Marc Erich Latoschik
Frontiers in Virtual Reality | VOL. 2
Andreas Halbig, et. al.Andreas Halbig ... Marc Erich Latoschik
14 Jul 2021
Frontiers in Virtual Reality | VOL. 2

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

SDA-CLIP: surgical visual domain adaptation using video and text labels.

Abstract

Talk to us

Similar Papers

More From: Quantitative imaging in medicine and surgery