Remote sensing (RS) scene classification plays an important role in a wide range of RS applications. Recently, convolutional neural networks (CNNs) have been applied to the field of scene classification in RS images and achieved impressive performance. However, to classify RS scenes, most of the existing CNN methods either utilize the high-level features from the last convolutional layer of CNNs, missing much important information existing at the other levels, or directly fuse the features at different levels, bringing redundant and/or mutually exclusive information. Inspired by the attention mechanism of the human visual system, in this article, we explore a novel relation-attention model and design an end-to-end relation-attention network (RANet) to learn powerful feature representations of multiple levels to further improve the classification performance. First, we propose to extract convolutional features at different levels by pretrained CNNs. Second, a multiscale feature computation module is constructed to connect features at different levels and generate multiscale semantic features. Third, a novel relation-attention model is designed to focus on the critical features whilst avoiding the use of redundant and even distractive ones by exploiting the scale contextual information. Finally, the resulting relation-attention features are concatenated and fed into a softmax layer for the final classification. Experiments on four well-known RS scene classification datasets (UC-Merced, WHU-RS19, AID, and OPTIMAL-31) show that our method outperforms some state-of-the-art algorithms.
Read full abstract