MIL-ViT: A multiple instance vision transformer for fundus image classification

Qi Bi,Xu Sun,Shuang Yu,Kai Ma,Cheng Bian,Munan Ning,Nanjun He,Yawen Huang,Yuexiang Li,Hanruo Liu,Yefeng Zheng

doi:10.1016/j.jvcir.2023.103956

Abstract

Despite the great success of deep learning approaches, retinal disease classification is still challenging as the early-stage pathological regions of retinal diseases may be extremely tiny and subtle, which are difficult for networks to detect. The feature representations learnt by deep learning models focusing more on the local view may lead to indiscriminative semantic-level representation. On the contrary, if they focus more on the global semantic-level, they may ignore the discerning subtle local pathological regions. To address this issue, in this paper, we propose a hybrid framework, combining the strong global semantic representation learning capability of the vision Transformer (ViT) and the excellent capacity of local representation extraction from the conventional multiple instance learning (MIL). Particularly, a multiple instance vision Transformer (MIL-ViT) is implemented, where the vanilla ViT branch and the MIL branch generate semantic probability distributions separately, and a bag consistency loss is proposed to minimize the difference between them. Moreover, a calibrated attention mechanism is developed to embed the instance representation into the bag representation in our MIL-ViT. To further improve the feature representation capability for fundus images, we pre-train the vanilla ViT on a large-scale fundus image database. The experimental results validate that the generalization capability of the model using our pre-trained weights for fundus disease diagnosis is better than the one using ImageNet pre-trained weights. Extensive experiments on four publicly available benchmarks demonstrate that our proposed MIL-ViT outperforms latest fundus image classification methods, including various deep learning models and deep MIL methods. All our source code and pre-trained models are publicly available at https://github.com/greentreeys/MIL-VT.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

MIL-ViT: A multiple instance vision transformer for fundus image classification

Abstract

Talk to us

Similar Papers

More From: Journal of Visual Communication and Image Representation

Lead the way for us

Journal: Journal of Visual Communication and Image Representation	Publication Date: Oct 18, 2023
Citations: 7

Similar Papers

MIL-VT: Multiple Instance Learning Enhanced Vision Transformer for Fundus Image Classification
Shuang Yu ... Cheng Bian
-
Shuang Yu, et. al.Shuang Yu ... Cheng Bian
01 Jan 2020
01 Jan 2020

Visual sentiment analysis via deep multiple clustered instance learning
Wenjing Gao ... Yonghua Zhu
Journal of Intelligent & Fuzzy Systems | VOL. 39
Wenjing Gao, et. al.Wenjing Gao ... Yonghua Zhu
01 Jan 2020
Journal of Intelligent & Fuzzy Systems | VOL. 39

Design and Analysis of Techniques for Multiple-Instance Learning in the Presence of Balanced and Skewed Class Distributions

-

01 Jan 2015
01 Jan 2015

Weakly supervised histopathology cancer image segmentation and classification
Yan Xu ... Jun-Yan Zhu
Medical Image Analysis | VOL. 18
Yan Xu, et. al.Yan Xu ... Jun-Yan Zhu
22 Feb 2014
Medical Image Analysis | VOL. 18

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

MIL-ViT: A multiple instance vision transformer for fundus image classification

Abstract

Talk to us

Similar Papers

More From: Journal of Visual Communication and Image Representation