The rapid spread of COVID-19 across the globe since its emergence has pushed many countries’ healthcare systems to the verge of collapse. To restrict the spread of the disease and lessen the ongoing cost on the healthcare system, it is critical to appropriately identify COVID-19-positive individuals and isolate them as soon as possible. The primary COVID-19 screening test, RT-PCR, although accurate and reliable, has a long turn-around time. More recently, various researchers have demonstrated the use of deep learning approaches on chest X-ray (CXR) for COVID-19 detection. However, existing Deep Convolutional Neural Network (CNN) methods fail to capture the global context due to their inherent image-specific inductive bias. In this article, we investigated the use of vision transformers (ViT) for detecting COVID-19 in Chest X-ray (CXR) images. Several ViT models were fine-tuned for the multiclass classification problem (COVID-19, Pneumonia and Normal cases). A dataset consisting of 7598 COVID-19 CXR images, 8552 CXR for healthy patients and 5674 for Pneumonia CXR were used. The obtained results achieved high performance with an Area Under Curve (AUC) of 0.99 for multi-class classification (COVID-19 vs. Other Pneumonia vs. normal). The sensitivity of the COVID-19 class achieved 0.99. We demonstrated that the obtained results outperformed comparable state-of-the-art models for detecting COVID-19 on CXR images using CNN architectures. The attention map for the proposed model showed that our model is able to efficiently identify the signs of COVID-19.