Abstract

Different-resolution change detection (DRCD) is now becoming an urgent problem to be solved, which is of great potential in rapid monitoring, such as disaster assessment, urban expansion, etc. In DRCD tasks, bi-temporal inputs are given in the form of different resolutions, thus conventional CD methods cannot be applied directly. Previous studies have attempted to deal with this problem by reconstructing the low-resolution (LR) image into a high-resolution (HR) one, including interpolation and super-resolution (SR). However, these solutions are limited by the availability of training data, making it hard to meet different kinds of needs. Besides, these image-level strategies have also ignored the interaction and alignment of high-level features. Therefore, we propose a new approach based on multi-model Transformers (MM-Trans), which solves the resolution gaps of bi-temporal inputs in DRCD tasks from the perspective of feature alignment. In the MM-Trans, a weight-unshared feature extractor is first utilized to precisely capture the features of the different-resolution inputs; then a spatial-aligned Transformer (sp-Trans) is introduced to align the LR-image features to the same size of the HR-image ones, which can be optimized in a learnable way by an auxiliary token loss; after that, a semantic-aligned Transformer (se-Trans) is adopted, in which the bi-temporal features can be further interacted and aligned semantically; finally, a prediction head is employed to obtain fine-grained change results. Experiments conducted on three common CD datasets, CDD, S2Looking, and HTCD dataset, have shown the advancement of the MM-Trans and fully demonstrated its potential in DSCD tasks.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call