Currently, more than 1.4 million asteroids are known in the main belt. Future surveys, like those that the Vera C. Rubin Observatory will perform, may increase this number to up to 8 million. While in the past identification of asteroids interacting with secular resonances was performed by a visual analysis of images of resonant arguments, this method is no longer feasible in the age of big data. Deep learning methods based on Convolutional Neural Networks (CNNs) have been used in the recent past to automatically classify databases of several thousands of images of resonant arguments for resonances like the ν6, the g−2g6+g5, and the s−s6−g5+g6. However, it has been shown that computer vision methods based on the Transformer architecture tend to outperform CNN models if the scale of the image database is large enough. Here, for the first time, we developed a Vision Transformer (ViT) model and applied it to publicly available databases for the three secular resonances quoted above. ViT architecture outperforms CNN models in speed and accuracy while avoiding overfitting concerns. If hyper-parameter tuning research is undertaken for each analyzed database, ViT models should be preferred over CNN architectures.