Abstract

Hyperspectral image (HSI) classification is the subject of intense research in remote sensing. The tremendous success of deep learning in computer vision has recently sparked the interest in applying deep learning in hyperspectral image classification. However, most deep learning methods for hyperspectral image classification are based on convolutional neural networks (CNN). Those methods require heavy GPU memory resources and run time. Recently, another deep learning model, the transformer, has been applied for image recognition, and the study result demonstrates the great potential of the transformer network for computer vision tasks. In this paper, we propose a model for hyperspectral image classification based on the transformer, which is widely used in natural language processing. Besides, we believe we are the first to combine the metric learning and the transformer model in hyperspectral image classification. Moreover, to improve the model classification performance when the available training samples are limited, we use the 1-D convolution and Mish activation function. The experimental results on three widely used hyperspectral image data sets demonstrate the proposed model’s advantages in accuracy, GPU memory cost, and running time.

Highlights

  • IntroductionHyperspectral image (HSI) classification is a focus point in remote sensing because of its many uses across fields, such as change area detection [1], land-use classification [2,3], and environmental protection [4]

  • In this paper, inspired by the Vision Transformer, we propose a lightweight network based on the transformer for hyperspectral image classification

  • We introduce the transformer architecture for hyperspectral image classification

Read more

Summary

Introduction

Hyperspectral image (HSI) classification is a focus point in remote sensing because of its many uses across fields, such as change area detection [1], land-use classification [2,3], and environmental protection [4]. Deep learning (DL) has become extremely popular because of its ability to extract features from raw data. It has been applied in computer vision tasks, such as image classification [5,6,7,8], object detection [9], semantic segmentation [10], and facial recognition [11]. Chen et al [12] proposed a stacked autoencoder for feature extraction

Methods
Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call