In recent years, deep learning-based classification methods for hyperspectral images (HSIs) have gained widespread popularity in fields such as agriculture, environmental monitoring, and geological exploration. This is owing to their ability to automatically extract features and deliver outstanding performance. This study provides a new Dilated Spectral–Spatial Gaussian Transformer Net (DSSGT) model. The DSSGT model incorporates dilated convolutions as shallow feature extraction units, which allows for an expanded receptive field while maintaining computational efficiency. We integrated transformer architecture to effectively capture feature relationships and generate deep fusion features, thereby enhancing classification accuracy. We used consecutive dilated convolutional layers to extract joint low-level spectral–spatial features. We then introduced Gaussian Weighted Pixel Embedding blocks, which leverage Gaussian weight matrices to transform the joint features into pixel-level vectors. By combining the features of each pixel with its neighbouring pixels, we obtained pixel-level representations that are more expressive and context-aware. The transformed vector matrix was fed into the transformer encoder module, enabling the capture of global dependencies within the input data and generating higher-level fusion features with improved expressiveness and discriminability. We evaluated the proposed DSSGT model using five hyperspectral image datasets through comparative experiments. Our results demonstrate the superior performance of our approach compared to those of current state-of-the-art methods, providing compelling evidence of the DSSGT model’s effectiveness.
Read full abstract