Abstract

Remote sensing image scene classification takes image blocks as classification units and predicts their semantic descriptors. Because it is difficult to obtain enough labeled samples for all classes of remote sensing image scenes, zero-shot classification methods which can recognize image scenes that are not seen in the training stage are of great significance. By projecting the image visual features and the class semantic features into the latent space and ensuring their alignment, the variational autoencoder (VAE) generative model has been applied to address remote-sensing image scene classification under a zero-shot setting. However, the VAE model takes the element-wise square error as the reconstruction loss, which may not be suitable for measuring the reconstruction quality of the visual and semantic features. Therefore, this paper proposes to augment the VAE models with the generative adversarial network (GAN) to make use of the GAN’s discriminator in order to learn a suitable reconstruction quality metric for VAE. To promote feature alignment in the latent space, we have also proposed cross-modal feature-matching loss to make sure that the visual features of one class are aligned with the semantic features of the class and not those of other classes. Based on a public dataset, our experiments have shown the effects of the proposed improvements. Moreover, taking the ResNet models of ResNet18, extracting 512-dimensional visual features, and ResNet50 and ResNet101, both extracting 2048-dimensional visual features for testing, the impact of the different visual feature extractors has also been investigated. The experimental results show that better performance is achieved by ResNet18. This indicates that more layers of the extractors and larger dimensions of the extracted features may not contribute to the image scene classification under a zero-shot setting.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.