Abstract

Remote sensing scene classification (RSSC) refers to inferring semantic labels based on the content of the remote sensing scenes. Recently, most works take the pretrained convolutional neural network (CNN) as the feature extractor to build a scene representation for RSSC. The activations in different layers of CNN (named intermediate features) contain different spatial and semantic information. Recent works demonstrate that aggregating intermediate features into a scene representation can significantly improve the classification accuracy for RSSC. However, the intermediate features are aggregated by some unsupervised feature encoding methods (e.g., Bag-of-Visual-Words). Little attention has been paid to explore the information of semantic labels for the feature aggregation. In this paper, in order to explore the semantic label information, an end-to-end feature aggregation CNN (FACNN) is proposed to learn a scene representation for RSSC. In FACNN, a supervised convolutional features’ encoding module and a progressive aggregation strategy are proposed to leverage the semantic label information to aggregate the intermediate features. The FACNN integrates the feature learning, feature aggregation, and classifier into a unified end-to-end framework for joint training. In FACNN, the scene representation is learned by considering the information of semantic labels, which can result in better performance for RSSC. Extensive experiments on AID, UC-Merged, and WHU-RS19 databases demonstrate that FACNN performs better than several state-of-the-art methods.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call