Abstract

Deep-learning models have demonstrated remarkable performance in a variety of fields, owing to advancements in computational power and the availability of extensive datasets for training large-scale models. Nonetheless, these models inherently possess a vulnerability wherein even small alterations to the input can lead to substantially different outputs. Consequently, it is imperative to assess the robustness of deep-learning models prior to relying on their decision-making capabilities. In this study, we investigate the adversarial robustness of convolutional neural networks (CNNs), vision transformers (ViTs), and hybrid CNNs +ViTs, which represent prevalent architectures in computer vision. Our evaluation is grounded on four novel model-sensitivity metrics that we introduce. These metrics are evaluated in the context of random noise and gradient-based adversarial perturbations. To ensure a fair comparison, we employ models with comparable capacities within each group and conduct experiments separately, utilizing ImageNet-1K and ImageNet-21K as pretraining data. Our fair experimental results provide empirical evidence that ViT-based models exhibit higher adversarial robustness than CNN-based counterparts, helping to dispel doubts about the findings of prior studies. Additionally, we introduce novel metrics that contribute new insights into the previously unconfirmed characteristics of these models.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call