Thoracic computed tomography (CT) image-based identification and severity classification of COVID-19 cases using vision transformer (ViT)

Gizatie Desalegn Taye,Zewdie Habtie Sisay,Genet Worku Gebeyhu,Fisha Haileslassie Kidus

doi:10.1007/s42452-024-06048-0

Abstract

In this research, we developed a two-stage deep learning (DL) model using Vision Transformer (ViT) to detect COVID-19 and assess its severity from thoracic CT images. In the first stage, we utilized a pre-trained ViT model (ViT_B/32) and a custom CNN model to classify CT images as COVID-19 or non-COVID-19. The ViT model achieved superior performance with a fivefold cross-validated accuracy of 99.7%, compared to the custom CNN’s 98%. In the second stage, we employed a ViT-based U-Net model (Vision Transformer for Biomedical Image Segmentation, VITBIS) to segment lung and infection regions in COVID-19 positive CT images, determining the infection severity. This model uses transformers with attention mechanisms in both the encoder and decoder. The lung segmentation network achieved an Intersection Over Union (IOU) of 95.8% and a sensitivity of 99.67%, while the lesion segmentation network attained an IOU of 94% and a sensitivity of 98.3%.

Full Text