Zero-Shot Medical Phrase Grounding with Off-the-shelf Diffusion Models.

Konstantinos Vilouras,Pedro Sanchez,Alison Q O'Neil,Sotirios A Tsaftaris

doi:10.1109/jbhi.2024.3494246

Abstract

Localizing the exact pathological regions in a given medical scan is an important imaging problem that traditionally requires a large amount of bounding box ground truth annotations to be accurately solved. However, there exist alternative, potentially weaker, forms of supervision, such as accompanying free-text reports, which are readily available. The task of performing localization with textual guidance is commonly referred to as phrase grounding. In this work, we use a publicly available Foundation Model, namely the Latent Diffusion Model, to perform this challenging task. This choice is supported by the fact that the Latent Diffusion Model, despite being generative in nature, contains cross-attention mechanisms that implicitly align visual and textual features, thus leading to intermediate representations that are suitable for the task at hand. In addition, we aim to perform this task in a zero-shot manner, i.e., without any training on the target task, meaning that the model's weights remain frozen. To this end, we devise strategies to select features and also refine them via post-processing without extra learnable parameters. We compare our proposed method with state-of-the-art approaches which explicitly enforce image-text alignment in a joint embedding space via contrastive learning. Results on a popular chest X-ray benchmark indicate that our method is competitive with SOTA on different types of pathology, and even outperforms them on average in terms of two metrics (mean IoU and AUC-ROC). Source code will be released upon acceptance at https://github.com/vios-s.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Zero-Shot Medical Phrase Grounding with Off-the-shelf Diffusion Models.

Abstract

Talk to us

Similar Papers

More From: IEEE journal of biomedical and health informatics

Lead the way for us

Similar Papers

Multiplicative angular margin loss for text-based person search
Peng Zhang ... Deqiang Ouyang
-
Peng Zhang, et. al.Peng Zhang ... Deqiang Ouyang
07 Mar 2021
07 Mar 2021

From known to the unknown: Transferring knowledge to answer questions about novel visual and semantic concepts
Moshiur R Farazi ... Nick Barnes
Image and Vision Computing | VOL. 103
Moshiur R Farazi, et. al.Moshiur R Farazi ... Nick Barnes
04 Aug 2020
Image and Vision Computing | VOL. 103

VGSG: Vision-Guided Semantic-Group Network for Text-based Person Search.
Shuting He ... Hao Luo
IEEE transactions on image processing : a publication of the IEEE Signal Processing Society | VOL. PP
Shuting He, et. al.Shuting He ... Hao Luo
01 Jan 2024
IEEE transactions on image processing : a publication of the IEEE Signal Processing Society | VOL. PP

Interacting-Enhancing Feature Transformer for Cross-Modal Remote-Sensing Image and Text Retrieval
Xu Tang ... Xiangrong Zhang
IEEE Transactions on Geoscience and Remote Sensing | VOL. 61
Xu Tang, et. al.Xu Tang ... Xiangrong Zhang
01 Jan 2023
IEEE Transactions on Geoscience and Remote Sensing | VOL. 61

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Zero-Shot Medical Phrase Grounding with Off-the-shelf Diffusion Models.

Abstract

Talk to us

Similar Papers

More From: IEEE journal of biomedical and health informatics