Abstract

The joint use of multisource remote-sensing (RS) data for Earth observation missions has drawn much attention. Although the fusion of several data sources can improve the accuracy of land-cover identification, many technical obstacles, such as disparate data structures, irrelevant physical characteristics, and a lack of training data, exist. In this article, a novel dual-branch method, consisting of a hierarchical convolutional neural network (CNN) and a transformer network, is proposed for fusing multisource heterogeneous information and improving joint classification performance. First, by combining the CNN with a transformer, the proposed dual-branch network can significantly capture and learn spectral–spatial features from hyperspectral image (HSI) data and elevation features from light detection and ranging (LiDAR) data. Then, to fuse these two sets of data features, a cross-token attention (CTA) fusion encoder is designed in a specialty. The well-designed deep hierarchical architecture takes full advantage of the powerful spatial context information extraction ability of the CNN and the strong long-range dependency modeling ability of the transformer network based on the self-attention (SA) mechanism. Four standard datasets are used in experiments to verify the effectiveness of the approach. The experimental results reveal that the proposed framework can perform noticeably better than state-of-the-art methods. The source code of the proposed method will be available publicly at <uri xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">https://github.com/zgr6010/Fusion_HCT.git</uri> .

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call