ABSTRACT Recently, deep learning for hyperspectral image classification has been successfully applied, and some convolutional neural network (CNN)-based models already achieved attractive classification results. Since hyperspectral data is a spectral-spatial cube data that can generally be considered as sequential data along with the spectral dimension, CNN models perform poorly on such a sequential data. Unlike convolutional neural networks (CNNs) that mainly concern with local relationship models in images, transformer has been shown to be a powerful structure for qualifying sequential data. In the SA (self-attention) module of ViT, each token is updated through aggregating all token’s features based on the self-attention graph. Through this, tokens can exchange information sufficiently among each other which provides a powerful representation capability. However, as the layers become deeper, the transformer model suffers from network degradation. Therefore, in order to improve the layer-to-layer information exchange and alleviate the network degradation problem, we propose a Weighted Residual Self-attention Graph-based Transformer (RSAGformer) model for hyperspectral image classification with respect to the self-attention mechanism. It effectively solves the network degradation problem of deep transformer model by fusing the self-attention information between adjacent layers and extracts the information of data effectively. Extensive experiment evaluation with six public hyperspectral datasets shows that the RSAGformer yields competitive results for classification.
Read full abstract