Abstract

The discriminative ability of image features plays a decisive role in content-based remote sensing image retrieval (CBRSIR). However, the widely-used convolutional neural networks cannot focus on the discriminative features of important scenes, resulting in unsatisfactory retrieval performance in complex contexts. In this paper, an attention-enhanced end-to-end discriminative network with multiscale learning for CBRSIR is proposed to solve this issue. First, a multiscale dilated convolution module is embedded into some of ResNet50's residual blocks to increase the perceptual field and capture the multiscale features of remote sensing image scenes. Then, a lightweight and efficient triplet attention module is added behind each residual block to capture the salient features of remote sensing images and establish the inter-dimensional dependencies using residual transform. In addition, the end-to-end training approach is performed using an online label smoothing loss to reduce the intra-class variance of features and enhance inter-class differentiability. Experimental results on four publicly available remote sensing image datasets show that our network achieves state-of-the-art or competitive performance, especially on complex scene dataset UCMD with an average retrieval precision improvement of 3.23% to 29.35% compared to other new methods.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call