Abstract

In hyperspectral images (HSIs), spatial context provides complementary information to abundant spectral features. In this paper, a united spectral-spatial framework named HTD-ViT based on vision transformer (ViT) is proposed for HTD tasks. The HTD-ViT leverages the ViT to learn discriminative spectral-spatial features of each pixel and its neighboring pixels. Meanwhile, the spectral-spatial sequence construction operation uses spectrums in the cross region centered on the selected pixel to produce the corresponding spectral-spatial sequence for ViT processing. Furthermore, the spectral-spatial sample selection procedure based on coarse detection addresses the issue of lacking well-labeled training instances in the HTD tasks. Finally, the spectral-spatial pixel-level detection combines the discriminative feature from the spectral and the spatial domains to suppress the background. In contrast to traditional spatial-spectral feature extraction methods that stack the original spectral feature with spatial neighborhood information directly, joint spectral-spatial inference in HTD-ViT can effectively discover the underlying contextual and structure information in HSIs. Experiments on real HSIs verify the effectiveness of HTD-ViT, which takes full advantage of both the variable spectral and spatial features.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.