Abstract

Currently, the convolutional neural networks (CNNs) have become the mainstream methods for hyperspectral image (HSI) classification, due to their powerful ability to extract local features. However, CNNs fail to effectively and efficiently capture the long-range contextual information and diagnostic spectral information of HSI. In contrast, the leading-edge vision transformers are capable of capturing long-range dependencies and processing sequential data such as spectral signatures. Nevertheless, pre-existing transformer-based classification methods generally generate inaccurate token embeddings from a single spectral or spatial dimension of raw HSIs and encounter difficulty modeling locality with insufficient training data. To mitigate these limitations, we propose a novel local-enhanced spectral-spatial transformer method (i.e., LESSFormer) specifically devised for HSI classification. Two effective and efficient modules are designed in LESSFormer, i.e., the HSI2Token module and the local-enhanced transformer encoder. The former is devised to transform HSI into the adaptive spectral-spatial tokens, and the latter is built to further enhance the representation ability of tokens by reinforcing the local information explicitly with a simple attention mask as well as retaining the long-range information in the meantime. Extensive experimental results on the new Xiong’an dataset and the widely used Pavia University and Houston University datasets have shown the superiority of LESSFormer over other state-of-the-art networks.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call