Abstract

ABSTRACT Remote sensing scene classification (RSSC) is an active topic in the field of remote sensing and has attracted a lot of attention due to its wide range of applications. Deep learning methods, especially Convolutional neural networks (CNN), significantly improve the performance of RSSC due to their strong feature extraction capabilities. However, the complicated spatial layout and diverse target distribution of remote sensing images make RSSC challenging. Current CNN usually tends to describe the global semantics of high-level features of images, but the extraction of local semantics, multi-scale features and anisotropic contextual features in remote sensing images needs to be enhanced to cope with the above challenges. To this end, an end-to-end hybrid structure, namely multi-scale attentive region adaptive aggregation (MARAA) learning is proposed, which makes full use of the rich semantic information of deep convolutional features and the high robustness of local adaptive aggregation. First, we extract spatial feature maps based on different layers of CNN, so that our feature extractor can learn multi-scale semantic representations. Second, an attention-enhanced local adaptive aggregation learning strategies is designed to aggregate the spatial features of each scale. Not only the dual attention is utilized to enhance the semantic features of local regions but also the local regions is divided into groups and the different orders of spatial adaptive aggregation learning based on hierarchical attention is designed to explore arbitrary contexts of local semantics. Subsequently, a context gating mechanism of sparse fusion is proposed to merge the adaptive aggregation features of local semantics of different scale spaces, so as to explore the advantages of cross-scale feature fusion. Finally, experiments on five publicly available RSSC benchmarks show that the classification performance of our MARAA significantly outperforms many state-of-the-art methods by capturing deep adaptive internal correlations of multi-scale attentive regions of the image.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call