Abstract

In recent years, various deep learning based methods have been successfully developed for change detection, such as Convolutional Neural Network (CNN) based U-Net and its variants, and Transformer based ones. However, CNNs lack the ability to effectively learn global representations, while Transformers neglect to learn local representations. Therefore, in this paper we propose a novel deep network, namely Multi-scale Attention based Transformer U-Net (MATU), to take advantages of CNNs and Transformers for learning both local and global features effectively. The backbone of our proposed MATU is a U-Net. In the encoder, a Siamese network is used to extract features from two input images, which is followed by a transformer module to further refine the feature pairs produced by the Siamese network. The difference of the refined feature pairs is fed into an Atrous Spatial Pyramid Pooling (ASSP) module to generate a distance map. Moreover, axial-attention blocks are integrated in the decoder with the corresponding multi-level feature differences of the encoder to progressively produce and improve the change map through attention upsampling. Extensive experiments on two widely used benchmark datasets SYSU-CD and LEVIR-CD demonstrate that the proposed MATU method achieves the state-of-the-art performance. Our code is available at https://github.com/easm002/MATU.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call