Abstract

Hyperspectral images (HSIs) contain abundant information in the spatial and spectral domains, allowing for a precise characterization of categories of materials. Convolutional neural networks (CNNs) have achieved great success in HSI classification, owing to their excellent ability in local contextual modeling. However, CNNs suffer from fixed filter weights and deep convolutional layers, which lead to a limited receptive field and high computational burden. The recent Vision Transformer (ViT) models long-range dependencies with a self-attention mechanism and has been an alternative backbone to the CNNs traditionally used in HSI classification. However, such transformer-based architectures designate all input pixels of the receptive field as feature tokens in terms of feature embedding and self-attention, which inevitably limits the ability for learning multi-scale features and increases the computational cost. To overcome this issue, we propose a local semantic feature aggregation-based transformer (LSFAT) architecture which allows transformers to represent long-range dependencies of multi-scale features more efficiently. We introduce the concept of the homogeneous region into the transformer by considering a pixel aggregation strategy and further propose neighborhood aggregation-based embedding (NAE) and attention (NAA) modules, which are able to adaptively form multi-scale features and capture locally spatial semantics among them in a hierarchical transformer architecture. A reusable classification token is included together with the feature tokens in the attention calculation. In the last stage, a fully connected layer is employed to perform classification on the reusable token after transformer encoding. We verify the effectiveness of the NAE and NAA modules compared with the traditional ViT through extensive experiments. Our results demonstrate the excellent classification performance of the proposed method in comparison with other state-of-the-art approaches on several public HSIs.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call