Abstract

Land cover classification from very high-resolution (VHR) remote sensing images is a challenging task due to the complexity of geography scenes and the varying shape and size of ground targets. It is difficult to utilize the spectral data directly, or to use traditional multi-scale feature extraction methods, to improve VHR remote sensing image classification results. To address the problem, we proposed a multi-modality and multi-scale attention fusion network for land cover classification from VHR remote sensing images. First, based on the encoding-decoding network, we designed a multi-modality fusion module that can simultaneously fuse more useful features and avoid redundant features. This addresses the problem of low classification accuracy for some objects caused by the weak ability of feature representation from single modality data. Second, a novel multi-scale spatial context enhancement module was introduced to improve feature fusion, which solves the problem of a large-scale variation of objects in remote sensing images, and captures long-range spatial relationships between objects. The proposed network and comparative networks were evaluated on two public datasets—the Vaihingen and the Potsdam datasets. It was observed that the proposed network achieves better classification results, with a mean F1-score of 88.6% for the Vaihingen dataset and 92.3% for the Potsdam dataset. Experimental results show that our model is superior to the state-of-the-art network models.

Highlights

  • Because very high-resolution (VHR) remote sensing images can provide more details of ground targets, they have been widely used for land cover classification and recognition under complex scenes.During land cover classification, it is a challenge to assign all pixels in remote sensing images to different semantic categories

  • We present a novel multi-scale spatial context enhancement module that considers the advantages of both ASPP (Atrous Spatial Pyramid Pooling) and a non-local block to improve image feature fusion, which successfully addresses the problem of a large difference in target scale in VHR remote sensing images

  • We compared the performance of the model with those of remote sensing image land cover classification networks based on deep learning in recent years

Read more

Summary

Introduction

Because VHR remote sensing images can provide more details of ground targets, they have been widely used for land cover classification and recognition under complex scenes. It is a challenge to assign all pixels in remote sensing images to different semantic categories. In contrast to single target recognition, in land cover classification, multiple targets in the image scene can be recognized at the same time, and the spatial distribution of ground targets cam counted. With the rapid development of deep learning techniques [5], convolutional neural networks (CNNs) [6] can provide hierarchical feature representation and learn deep semantic features, which are important and useful for improving model performance. CNNs have achieved significant success in the field of computer

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call