Abstract

Hyperspectral images (HSIs) have been widely used in Earth observation because they contain continuous and detailed spectral information which is beneficial for the fine-grained diagnosis of the land cover. In the past few years, convolutional neural network (CNN)-based methods show limitations in modeling spectral-wise long-range dependences. Recently, transformer-based deep learning methods are proposed and have shown superiority in modeling the continuous representation of the spectral signatures because the self-attention (SA) mechanism has a global receptive field. Due to the special tokenization of the transformer-based methods, the redundant tokens contained in spectral embeddings are always involved in SA operation. Redundant tokens do not positively contribute to classification. Specifically, the overlapped group-wise tokenization approach may aggravate the Hughes phenomenon and impose additional computations. To address this issue, a lightweight spatial–spectral pyramid transformer (SSPT) framework is proposed to efficiently extract the spatial–spectral features of HSI by progressively reducing redundant tokens in an end-to-end manner. In particular, a token reduction (TR) method is proposed to decide which tokens will be involved by computing and comparing token attentiveness between spectral embeddings and the class token. In addition, for those tokens that are defined as redundant information, a token compensation mechanism is proposed to automatically extract supplementary information for classification. Extensive experiments on three standard datasets quantitatively show the superiority of our methods, and the ablation experiments qualitatively prove our hypothesis about the feature distribution in transformer architecture.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call