Abstract

Remote sensing (RS) scene classification is a crucial research topic in the RS community, and many convolutional neural networks (CNNs)-based methods have been proposed to improve classification performance. Due to the intrinsic locality of convolution operations, CNNs are good at extracting local information but are not easy to capture global contextual information which is also important to fully interpret RS scenes. Recently, transformer has shown the potential for learning global contextual information, but it pays less attention to local information. In this paper, we propose a new interactive dual-branch network for RS scene classification, named Resformer, which can use CNNs's efficiency in extracting local information as well as transformer's power in capturing global information. Besides, we propose a two-way feature interaction module (TFIM), which can not only efficiently fuse CNNs-based local features with transformer-based global fetures, but also extract multi-scale information from RS scenes. Finally, we use a class score fusion strategy to integrate the features extracted from the two branches. Encouraging experimental results counted on two public RS scene data sets demonstrate that our Resformer is effective in RS scene classification task.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call