A Study of Weather-Image Classification Combining VIT and a Dual Enhanced-Attention Module

Jing Li,Xueping Luo

doi:10.3390/electronics12051213

Abstract

A weather-image-classification model combining a VIT (vision transformer) and dual augmented attention module is proposed to address the problems of the insufficient feature-extraction capability of traditional deep-learning methods with the recognition accuracy still to be improved and the limited types of weather phenomena existing in the dataset. A pre-trained model vision transformer is used to acquire the basic semantic feature representation of weather images. Dual augmented attention combined with convolutional self-attention and Atrous self-attention modules are used to acquire the low-level and high-level deep-image semantic representations, respectively, and the feature vectors are spliced and fed into the linear layer to obtain the weather types. Experimental validation is performed on the publicly available standard weather-image datasets MWD (Multi-class Weather Database) and WEAPD (Weather Phenomenon Database), and the two datasets are combined to enhance the comprehensiveness of the model for weather-phenomena recognition. The results show that the model achieves the highest F1 scores of 97.47%, 87.69% and 92.73% on the MWD, WEAPD and merged datasets, respectively. These scores are higher than the scores of recent deep-learning models with excellent performance in the experimental comparison, thereby, proving the effectiveness of the model.

Full Text