Abstract

Analyzing land cover changes with multi-temporal remote sensing (RS) images is crucial for environmental protection and land planning. In this paper, we explore Remote Sensing Image Change Captioning (RSICC), a new task aiming at generating human-like language descriptions for the land cover changes in multi-temporal RS images. We propose a novel Transformer-based RSICC model (RSICCformer). It consists of three main components: 1) a CNN-based feature extractor to generate high-level features of RS image pairs, 2) a dual-branch Transformer encoder to improve the feature discrimination capacity for the changes, and 3) a caption decoder to generate sentences describing the differences. The dual-branch Transformer encoder consists of a hierarchy of processing stages to capture and recognize multiple changes of interest. Concretely, we use the bi-temporal feature differences as keys to enhance image features (queries) from each temporal image in the dual-branch Transformer encoder. To explore the RSICC task, we build a large-scale dataset named LEVIR-CC, which contains 10077 pairs of bi-temporal RS images and 50385 sentences describing the differences between images. We benchmark existing state-of-the-art synthetic image change captioning methods on the LEVIR-CC dataset, and our RSICCformer outperforms previous methods with a significant margin (+4.98% on BLEU-4 and +9.86% on CIDEr-D). The attention visualization results also suggest that our model can focus on changes of interest and ignore irrelevant changes.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.