Abstract

The standard approach for automated clinical image diagnosis is being held with the use of Convolutional Neural Networks (CNN) for a decade. Vision Transformers (ViT) are new in this domain and yield similar levels of performance to that of CNN making them a competitive alternative to CNNs. This paper proposes an alternative off-the-shelf ViT-based approach to detecting lung diseases. This approach has been compared with a CNN-based hybrid deep learning approach that outperforms existing different deep learning techniques. The hybrid deep learning model used for comparison is called Visual Geometric Group Data Spatial Transformer with CNN (VDSNet) and the experimental results are computed by using the open-source NIH chest X-rays dataset from Kaggle. In this study, we observe vision transformers that are pre-trained outperform CNN-based VDSNet in several metrics on full as well as different subsets of the dataset. Vision Transformers also show an increase in accuracy with the addition of internal layers and reduction in patch size at the expense of slightly higher training time making them a potential alternative to Convolutional Neural Networks.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call