For brain diseases, e.g., autism spectrum disorder (ASD), with unclear biological characteristics, the detection of imaging-based biomarkers is a critical task for diagnosis. Several landmark-based categorization approaches have been developed for the computer-aided diagnosis of brain diseases, such as Alzheimer’s disease (AD), utilizing structural magnetic resonance imaging (sMRI). With the automatic detection of the landmarks of brain disease, more detailed brain features were identified for clinical diagnosis. Multi-instance learning is an effective technique for classifying brain diseases based on landmarks. The multiple-instance learning approach relies on the assumption of independent distribution hypotheses and is mostly focused on local information, thus the correlation among different brain regions may be ignored. However, according to previous research on ASD and AD, the abnormal development of different brain regions is highly correlated. Vision Transformers, with self-attention modules to capture the relationship between embedded patches from a whole image, have recently demonstrated superior performances in many computer vision tasks. Nevertheless, the utilization of 3D brain MRIs imposes a substantial computational load, especially while training with Vision Transformer. To address the challenges mentioned above, in this research, we proposed a landmark-based multi-instance Conv-Transformer (LD-MILCT) framework as a solution to the aforementioned issues in brain disease diagnosis. In this network, a two-stage multi-instance learning strategy was proposed to explore both spatial and morphological information between different brain regions; the Vision Transformer utilizes a multi-instance learning head (MIL head) to fully utilize the features that are not involved in the ultimate classification. We assessed our proposed framework using T1-weighted MRI images from both AD and ASD databases. Our method outperformed existing deep learning and landmark-based methods in terms of brain MRI classification tasks.
Read full abstract