Abstract
Remote sensing (RS) image scene classification task faces many challenges due to the interference from different characteristics of different geographical elements. To solve this problem, we propose a multi-branch ensemble network to enhance the feature representation ability by fusing features in final output logits and intermediate feature maps. However, simply adding branches will increase the complexity of models and decline the inference efficiency. On this issue, we embed self-distillation (SD) method to transfer knowledge from ensemble network to main-branch in it. Through optimizing with SD, main-branch will have close performance as ensemble network. During inference, we can cut other branches to simplify the whole model. In this paper, we first design compact multi-branch ensemble network, which can be trained in an end-to-end manner. Then, we insert SD method on output logits and feature maps. Compared to previous methods, our proposed architecture (ESD-MBENet) performs strongly on classification accuracy with compact design. Extensive experiments are applied on three benchmark RS datasets AID, NWPU-RESISC45 and UC-Merced with three classic baseline models, VGG16, ResNet50 and DenseNet121. Results prove that our proposed ESD-MBENet can achieve better accuracy than previous state-of-the-art (SOTA) complex models. Moreover, abundant visualization analysis make our method more convincing and interpretable.
Highlights
R EMOTE sensing scene classification is a recent popular task in practical application
1) We propose a more compact yet efficient multibranch ensemble network, explore the weight-sharing potential of multibranch networks, and add the feature augmentation modules to compensate for the lack of diversity to overcome the interference of different geographical elements in remote sensing images
The discriminative modality distillation approach is introduced in [47], the teacher is trained on multimodal data, and the student model learns from the teacher model to improve the performance of the remote sensing image classifications
Summary
R EMOTE sensing scene classification is a recent popular task in practical application. It reveals the geographical characteristics, such as land utilization and vegetation coverage [1]. With the progress of RS scene classification, research on local land planning, tree planting, and afforestation can be realized more intelligent. With the rapid development of deep learning technology [2], [3], methods. Manuscript received April 1, 2021; revised June 21, 2021 and September 10, 2021; accepted October 23, 2021.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
More From: IEEE Transactions on Geoscience and Remote Sensing
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.