Abstract

High-resolution remote sensing image scene classification is a challenging visual task due to the large intravariance and small intervariance between the categories. To accurately recognize the scene categories, it is essential to learn discriminative features from both global and local critical regions. Recent efforts focus on how to encourage the network to learn multigranularity features with the destruction of the spatial information on the input image at different scales, which leads to meaningless edges that are harmful to training. In this study, we propose a novel method named Semantic Multigranularity Feature Learning Network (SMGFL-Net) for remote sensing image scene classification. The core idea is to learn both global and multigranularity local features from rearranged intermediate feature maps, thus, eliminating the meaningless edges. These features are then fused for the final prediction. Our proposed framework is compared with a collection of state-of-the-art (SOTA) methods on two fine-grained remote sensing image scene datasets, including the NWPU-RESISC45 and Aerial Image Datasets (AID). We justify several design choices, including the branch granularities, fusion strategies, pooling operations, and necessity of feature map rearrangement through a comparative study. Moreover, the overall performance results show that SMGFL-Net consistently outperforms other peer methods in classification accuracy, and the superiority is more apparent with less training data, demonstrating the efficacy of feature learning of our approach.

Highlights

  • Remote sensing (RS) refers to the practice of observing, recording, measuring, and deriving information about the Earth’s land and water surfaces using images acquired from an overhead perspective [1]

  • MGML-FENet achieves the best accuracy with a training ratio of 0.2 on NWPU-RESISC45, SMGFL-Net outperforms it on Aerial Image Datasets (AID)

  • The proposed SMGFL-Net learns multigranularity features through a destruction operation on intermediate feature maps by a jigsaw puzzle generator with different sizes, which avoids the meaningless edges appearing in prior studies

Read more

Summary

Introduction

Remote sensing (RS) refers to the practice of observing, recording, measuring, and deriving information about the Earth’s land and water surfaces using images acquired from an overhead perspective [1]. It is observed that the scene samples of Medium and Dense Residential are similar; further, for the four Palace samples, there is a semantic gap in the color, size, shape, and edge distributions. To better recognize these scene objects, both global and local features are crucial. Global statistical features help distinguish Medium Residential and Dense Residential, and local features are essential for recognizing the Palace

Objectives
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call