A cross-modal feature aggregation and enhancement network for hyperspectral and LiDAR joint classification

Yiyan Zhang,Hongmin Gao,Jun Zhou,Chenkai Zhang,Pedram Ghamisi,Shufang Xu,Chenming Li,Bing Zhang

doi:10.1016/j.eswa.2024.125145

Abstract

Advancements in Earth observation technologies have greatly enhanced the potential of integrating hyperspectral (HS) images with Light Detection and Ranging (LiDAR) data for land use and land cover classification. Despite this, most existing methods primarily focus on employing deep network layers to extract features from two heterogeneous data modalities, often overlooking a gradual modeling data representation approach from shallow to deep layers. Furthermore, excessive network layers can result in the deterioration of modality-specific features, therefore lowering the classification performance. The paper proposed a novel cross-modal feature aggregation and enhancement network for the joint classification of HSI and LiDAR data. Initially, a cross-modal feature fusion module is developed to exploit spatial scale consistency to complete the interchange and fusion of feature embedding at the pixel level, preserving the original information from the two heterogeneous modalities to a certain degree. Then two straightforward strategies (i.e., addition and concatenation) are employed in the shallow network layers before being sent to the transformer encoder. The former facilitates the model’s ability to discern more subtle distinctions and refine spatial location details. The latter ensures the preservation of information integrity, effectively mitigating the risk of feature loss. Moreover, invertible neural networks and a feature enhancement module are introduced, leveraging the complementary information of HSI and LiDAR data to enhance the detail and texture information extracted in deeper layers. Extensive experiments on Houston2013, Trento, and MUUFL datasets demonstrate that the proposed method outperforms several state-of-the-art models in three evaluation metrics, achieving an accuracy improvement of up to 2%. The proposed model brings new inspirations for HSI and LiDAR classification, which is critical for accurate environmental monitoring, urban planning, and precision agriculture. The source code is publicly accessible at https://github.com/zhangyiyan001/CMFAEN.

Full Text