Abstract

Convolutional neural networks (CNNs) have been successfully used in remote sensing scene classification and identification due to their ability to capture deep spatial feature representations. However, the performance of deep models inevitably encounters a bottleneck when multimodality-dominated scene classification rather than single-modality-dominated scene classification is performed, due to the high similarity among different categories. In this study, we propose a novel multi-granularity fusion convolutional neural network (MGFN) to automatically capture the latent ontological features of remote sensing images. We firstly design a multigranularity module that can progressively crop input images to learn multigrained features, which can describe images to different degrees. Based on a comparison of different granularities, we then design a maxout-based module to learn the corresponding Gaussian covariance matrices of different granularities, which can extract second-order features to express the latent ontological essence of inputs and select the most distinguished inputs. We thirdly provide an adaptive fusion module to fuse all features via normalization to combine features of different degrees using the adaptive fused module. Finally, an SVM classifier is used to classify the fused matrix of every input image. Extensive experimentation and evaluations, particularly for multimodality-dominated scenes, demonstrate that the proposed network can achieve promising results for public remote sensing datasets.

Highlights

  • Remote sensing scene classification is one of the most researched areas and challenging topics in the geoscience and remote sensing community, since it is a process of classifying remotely sensed images into discrete sets of land use and land cover categories with semantic meanings [1]–[4]

  • For multimodality-dominated scenes, problems are more difficult to solve because many categories have hierarchical ontologies: To address these problems, we propose a novel multigranularity fused convolutional neural network (MGFN) to capture the latent ontological features of remote sensing images automatically

  • RELATED WORK Propelled by the high-level feature learning capabilities of convolutional neural networks (CNNs), remote sensing scene classification driven by deep neural networks has drawn remarkable attention and achieved significant breakthroughs

Read more

Summary

INTRODUCTION

Remote sensing scene classification is one of the most researched areas and challenging topics in the geoscience and remote sensing community, since it is a process of classifying remotely sensed images into discrete sets of land use and land cover categories with semantic meanings [1]–[4]. The primary strategy of these methods is to apply a pretrained CNN to target remote sensing scene images or fine-tune the pretrained CNN model with the target dataset. Most of these CNN models, either transferring or not, are designed and used for single-modality-dominated scenes. Most deep learning-based classification methods can learn high-level features but cannot incorporate them with high-level semantic meanings in category labels because most remote sensing datasets lack well-constructed ontological structures. For multimodality-dominated scenes, problems are more difficult to solve because many categories have hierarchical ontologies: To address these problems, we propose a novel multigranularity fused convolutional neural network (MGFN) to capture the latent ontological features of remote sensing images automatically.

RELATED WORK
GRANULARITY EXTRACTION MODULE
MAXOUT-BASED MODULE
ADAPTIVE FUSION METHOD
CLASSIFICATION LAYER
EXPERIMENTS AND ANALYSIS
EXPERIMENTAL DATASETS
Findings
CONCLUSION
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call