Abstract

Aerial image classification is of great significance in the remote sensing community, and many researches have been conducted over the past few years. Among these studies, most of them focus on categorizing an image into one semantic label, while in the real world, an aerial image is often associated with multiple labels, e.g., multiple object-level labels in our case. Besides, a comprehensive picture of present objects in a given high-resolution aerial image can provide a more in-depth understanding of the studied region. For these reasons, aerial image multi-label classification has been attracting increasing attention. However, one common limitation shared by existing methods in the community is that the co-occurrence relationship of various classes, so-called class dependency, is underexplored and leads to an inconsiderate decision. In this paper, we propose a novel end-to-end network, namely class-wise attention-based convolutional and bidirectional LSTM network (CA-Conv-BiLSTM), for this task. The proposed network consists of three indispensable components: (1) a feature extraction module, (2) a class attention learning layer, and (3) a bidirectional LSTM-based sub-network. Particularly, the feature extraction module is designed for extracting fine-grained semantic feature maps, while the class attention learning layer aims at capturing discriminative class-specific features. As the most important part, the bidirectional LSTM-based sub-network models the underlying class dependency in both directions and produce structured multiple object labels. Experimental results on UCM multi-label dataset and DFC15 multi-label dataset validate the effectiveness of our model quantitatively and qualitatively.

Highlights

  • The feature extraction module is designed for extracting fine-grained semantic feature maps, while the class attention learning layer aims at capturing discriminative class-specific features

  • We propose an end-to-end trainable network architecture for multilabel classification, which consists of a feature extraction module, a class attention learning layer, and a bidirectional LSTM-based sub-network

  • We propose a novel network, CA-Conv-BiLSTM, for the multi-label classification of high-resolution aerial imagery

Read more

Summary

Introduction

With the booming of remote sensing techniques in the recent years, a huge volume of high resolution aerial imagery is accessible and benefits a wide range of real-world applications, such as urban mapping (Marmanis et al, 2018; Audebert et al, 2018; Marcos et al, 2018; Mou and Zhu, 2018a), ecological monitoring (Zarco-Tejada et al, 2014; Wen et al, 2017), geomorphological analysis (Mou and Zhu, 2018b; Lucchesi et al, 2013; Weng et al, 2018; Cheng et al, 2017), and traffic management (Mou and Zhu, 2018c; Mou and Zhu, 2016; Li et al, 2018). Numerous researches, i.e., semantic segmentation (Ren et al, 2015; Long et al, 2015; Badrinarayanan et al, 2015) and object detection (Ren et al, 2015; Viola and Jones, 2001; Lin et al, 2017; Ren et al, 2017), have emerged recently It is extremely labor- and time-consuming to acquire ground truths for these studies (i.e., pixel-wise segmentation masks and bounding-boxlevel annotations). Compared to these expensive labels, image-level labels (cf multiple object-level labels in Fig. 1) are at a fair low cost and readily accessible. The first two authors contributed to this work

Methods
Results
Discussion
Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.