A Joint Convolutional Cross ViT Network for Hyperspectral and Light Detection and Ranging Fusion Classification

Haitao Xu,Yuzhe Liu,Zhiyuan Zhang,Jiaojiao Li,Tie Zheng,Changbin Xue

doi:10.3390/rs16030489

Haitao Xu, Yuzhe Liu + Show 4 more

Open Access

PDF Available

https://doi.org/10.3390/rs16030489

Copy DOI

Export

Save

Cite

Journal: Remote Sensing	Publication Date: Jan 26, 2024
Citations: 7	License type: CC BY 4.0

Abstract
Full-Text PDF
Similar Papers

Abstract

Listen

The fusion of hyperspectral imagery (HSI) and light detection and ranging (LiDAR) data for classification has received widespread attention and has led to significant progress in research and remote sensing applications. However, existing common CNN architectures suffer from the significant drawback of not being able to model remote sensing images globally, while transformer architectures are not able to capture local features effectively. To address these bottlenecks, this paper proposes a classification framework for multisource remote sensing image fusion. First, a spatial and spectral feature projection network is constructed based on parallel feature extraction by combining HSI and LiDAR data, which is conducive to extracting joint spatial, spectral, and elevation features from different source data. Furthermore, in order to construct local–global nonlinear feature mapping more flexibly, a network architecture coupling together multiscale convolution and a multiscale vision transformer is proposed. Moreover, a plug-and-play nonlocal feature token aggregation module is designed to adaptively adjust the domain offsets between different features, while a class token is employed to reduce the complexity of high-dimensional feature fusion. On three open-source remote sensing datasets, the performance of the proposed multisource fusion classification framework improves about 1% to 3% over other state-of-the-art algorithms.

Full Text