HTD-VIT: Spectral-Spatial Joint Hyperspectral Target Detection with Vision Transformer

Haonan Qin,Weiying Xie,Yunsong Li,Qian Du

doi:10.1109/igarss46834.2022.9884695

Abstract

In hyperspectral images (HSIs), spatial context provides complementary information to abundant spectral features. In this paper, a united spectral-spatial framework named HTD-ViT based on vision transformer (ViT) is proposed for HTD tasks. The HTD-ViT leverages the ViT to learn discriminative spectral-spatial features of each pixel and its neighboring pixels. Meanwhile, the spectral-spatial sequence construction operation uses spectrums in the cross region centered on the selected pixel to produce the corresponding spectral-spatial sequence for ViT processing. Furthermore, the spectral-spatial sample selection procedure based on coarse detection addresses the issue of lacking well-labeled training instances in the HTD tasks. Finally, the spectral-spatial pixel-level detection combines the discriminative feature from the spectral and the spatial domains to suppress the background. In contrast to traditional spatial-spectral feature extraction methods that stack the original spectral feature with spatial neighborhood information directly, joint spectral-spatial inference in HTD-ViT can effectively discover the underlying contextual and structure information in HSIs. Experiments on real HSIs verify the effectiveness of HTD-ViT, which takes full advantage of both the variable spectral and spatial features.

Full Text