Abstract

Gastric intestinal metaplasia (GIM) is a frequent lesion of the gastrointestinal tract, and its clinical grading can significantly reduce the risk of gastric cancer. Pre-trained vision–language models have demonstrated impressive generalizability in many domains. However, they suffer from the following limitations when applied to GIM grading: 1) Small GIM lesions of endoscopic images and their colour features are difficult to represent with global representations alone, 2) medical concepts are highly specialized, and their dependencies and semantics cannot be precisely represented and 3) Because of the scarcity of medical prior knowledge, multi-modal prompts are over-reliant on random initialization and fail to maintain semantic consistency. To address the above challenges, we propose a self-supervised visual–textual prompt (SVTP) learning strategy for few-shot GIM grading. In particular, image and colour prompts are simultaneously configured in the visual branch to use the colour information in lesion areas. Meanwhile, a coupling function is introduced to implement the exchange of high-dimensional information. Then, to enhance the encoding of complex information and model the dependencies between medical concepts, text prompts and an adapter are integrated into the textual branch. Finally, a three-dimensional contrastive learning (3DCL) strategy is proposed to enrich medical prior knowledge of prompts and maintain their semantic consistency, where three-modal prompts are projected into a unified 3D space for contrastive learning. We verified SVTP on a public endoscopic image dataset and a private GIM grading dataset. Experimental results demonstrate that SVTP has achieved SOTA grading performance on all datasets.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call