Abstract

• The cosine similarity and a gating mechanism are used by the SFM to fuse IRRG and DSM data and extract corresponding complementary features. • The MCFAM is used to extract multiscale context cues, solving the problem of object-scale variations in HRRSIs. • The DAMM forms a dense-attention branch that can adaptively capture both low- and high-level attentions. Although significant progress has been made in scene classification of high-resolution remote-sensing images (HRRSIs), dual-modal HRRSI scene classification is still an active and challenging issue. In this study, we introduce an end-to-end dense-attention–similarity-fusion network (DASFNet) for dual-modal HRRSIs. Specifically, we propose a dense-attention map module based on graph convolution, which adaptively captures long-range semantic cues and further directs shallow-attention cues to the deep level to guide the generation of high-level feature attention cues. At the encoder stage, DASFNet uses feature similarity to explore the correlation between dual-modal features; a similarity-fusion module extracts complementary information by fusing features from different modalities. A multiscale context-feature-aggregation module is used to strengthen the feature embedding of any two spatial scales; this solves the of scale change problem. A large number of experiments on two HRRSI benchmark datasets for scene classification indicate that the proposed DASFNet outperforms the outstanding scene classification approaches.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call