Abstract

Remote sensing image scene classification faces challenges, such as the difference in semantic granularity of different scene categories and the imbalance of the number of samples, which cause the wrong features learning for deep convolutional networks (DCNs). This article proposes a multiple granularity semantic learning network (MGSN), including multiple granularity semantic learning (MGSL) and nonuniform sampling augmentation (NUA) modules. Specifically, the MGSL module makes full use of different granularities of semantic information of scenes, guiding the network to learn global and local features simultaneously. And, the relationship between semantic features of different granularity has been explored, based on which the learning of coarse-grained features helps to improve the learning of fine-grained semantic features. It shows that learning fine-grain semantics can inhibit learning coarse-grain semantic features. The NUA module combines sampling and sample augmentation to balance the sample distribution, which can avoid overfitting caused by oversampling. The proposed MGSN achieved state-of-the-art classification accuracy on two large-scale remote sensing image scene classification datasets, Million-AID and NWPU-RESISC45. Under 10<inline-formula><tex-math notation="LaTeX">$\%$</tex-math></inline-formula> and 20<inline-formula><tex-math notation="LaTeX">$\%$</tex-math></inline-formula> training samples of the NWPU-RESISC45 dataset, MGSN achieves 91.92<inline-formula><tex-math notation="LaTeX">$\%$</tex-math></inline-formula> and 94.33<inline-formula><tex-math notation="LaTeX">$\%$</tex-math></inline-formula> top-1 accuracy, respectively. In experiments conducted on the Million-AID dataset, the proposed MGSN performed best among 18 DCNs. In comparison to the baseline, FixEfficientNet, MGSN improved the accuracy of top-1 and top-5 by 10.63<inline-formula><tex-math notation="LaTeX">$\%$</tex-math></inline-formula> and 5.47<inline-formula><tex-math notation="LaTeX">$\%$</tex-math></inline-formula>, respectively, with low complexity costs.

Highlights

  • R EMOTE sensing image scene classification [1]–[4] has been widely used in fields [5] such as land surveying, nature monitoring, and urban planning [6]

  • We explored the relation between semantic features of different granularity, founding that coarsegrained feature learning helps to improve the learning of fine-grained semantic features, while fine-grained semantic feature learning can inhibit the learning of coarse-grained semantic features

  • It is observed that our method performs well in most categories such as wastewater plant, golf course, stadium, parking lot, pier, and so on. These results show the effectiveness of the proposed multiple granularity semantic learning method, and the performance gradually improves with the increase of semantic granularity

Read more

Summary

Introduction

R EMOTE sensing image scene classification [1]–[4] has been widely used in fields [5] such as land surveying, nature monitoring, and urban planning [6]. It has made great progress [7], [8] with the development of deep learning [9], [10] and automatic machine learning [11], such as neural. (Corresponding author:Shengyang Li) architecture search (NAS) technology [12]. The existence of remote sensing scenes labeled with different semantic granularity requires deep convolutional networks (DCNs) to simultaneously learn global and local features. It is difficult for a classifier to simultaneously learn multiple granularity semantic information and features of different scale regions

Methods
Results
Discussion
Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.