Recurrent 3-D Multi-Level Visual Transformer For Joint Classification of Heterogeneous 2-d AND 3-D Radiographic Data
Recent advancements in artificial intelligence algorithms for medical imaging show significant potential in automating the detection of lung infections from chest radiograph scans. However, current approaches often focus solely on either 2-D or 3-D scans, failing to leverage the combined advantages of both modalities. Moreover, conventional slice-based methods place a manual burden on radiologists for slice selection. To overcome these challenges, we propose the Recurrent 3-D Multi-level Vision Transformer (R3DM-ViT) model, capable of handling multimodal data to enhance diagnostic accuracy. Our quantitative evaluations demonstrate that R3DM-ViT surpasses existing methods, achieving an impressive accuracy of $96.67 \%$, F1-score of $96.88 \%$, mean average precision of $96.75 \%$, and mean average recall of $97.02 \%$. This research signifies a significant stride forward in the automated detection of lung infections through multimodal imaging.