Hyperspectral images (HSI) contain rich spatial and spectral detail information, while light detection and ranging (LiDAR) data can provide the elevation information. Thus, the fusion of HSI and LiDAR data can help for more accurate image classification, which becomes a hot research topic. However, it is difficult to capture complex local and global spatial-spectral associations, meanwhile, how to build an effective interaction between multi-modal data is another important issue. To this end, a novel global-local transformer network (GLT-Net) is proposed for the joint classification of HSI and LiDAR data, in this paper. The main idea is to fully exploit the advantage of the convolution operator in characterizing locally correlated features and the promising capability of transformer architecture in learning long-range dependencies. Moreover, multi-scale feature fusion and probabilistic decision fusion strategies are also designed in one framework, in order to further improve classification performance. Here, the proposed GLT-Net mainly consists of multi-scale local spatial feature learning, global spectral feature learning, and global-local feature fusion classification. In specific, multi-modal image cubes of different sizes are firstly extracted and sent into convolutional neural networks (CNNs) to learn local spatial features, which is followed by multi-modal information propagation and spatial-attention guided multi-scale feature fusion. Afterwards, by considering spectral feature channels from a sequential perspective, vision transformers are introduced to model the global spectral dependencies. Finally, multiple class estimations based on local and global features are integrated via a probabilistic decision fusion strategy. In this way, complementary information of multi-modal data as well as local/global spectral-spatial information can be fully mined and jointly utilized. Extensive experiments on three popular HSI and LiDAR datasets demonstrate that the proposed method performs superiority over state-of-the-art methods. The source code of the proposed method will be made publicly available at https://github.com/Ding-Kexin/GLT-Net.
Read full abstract