Abstract
• The cosine similarity and a gating mechanism are used by the SFM to fuse IRRG and DSM data and extract corresponding complementary features. • The MCFAM is used to extract multiscale context cues, solving the problem of object-scale variations in HRRSIs. • The DAMM forms a dense-attention branch that can adaptively capture both low- and high-level attentions. Although significant progress has been made in scene classification of high-resolution remote-sensing images (HRRSIs), dual-modal HRRSI scene classification is still an active and challenging issue. In this study, we introduce an end-to-end dense-attention–similarity-fusion network (DASFNet) for dual-modal HRRSIs. Specifically, we propose a dense-attention map module based on graph convolution, which adaptively captures long-range semantic cues and further directs shallow-attention cues to the deep level to guide the generation of high-level feature attention cues. At the encoder stage, DASFNet uses feature similarity to explore the correlation between dual-modal features; a similarity-fusion module extracts complementary information by fusing features from different modalities. A multiscale context-feature-aggregation module is used to strengthen the feature embedding of any two spatial scales; this solves the of scale change problem. A large number of experiments on two HRRSI benchmark datasets for scene classification indicate that the proposed DASFNet outperforms the outstanding scene classification approaches.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have