Change detection (CD) of remote sensing (RS) images is mushrooming up accompanied by the on-going innovation of convolutional neural networks (CNNs). Yet with the high-speed technology upgrade, the obstacle that identifies unbalanced variations in foreground–background categories still lies on the table, especially in cases with limited samples and massive interference such as seasonal turnover, illumination intensity, and building reformation. Moreover, to date, neither of the off-the-shelf methods probes the feasibility of direct interaction between bitemporal images before accessing difference features. In this article, we propose a dual-branch multilevel intertemporal network (DMINet) to efficiently and effectively derive the change representations. Specifically, by unifying self-attention (SelfAtt) and cross-attention (CrossAtt) in a single module, we present an intertemporal joint-attention (JointAtt) block to steer the global feature distribution of each input, motivating information coupling between intralevel representations and meanwhile suppressing the task-irrelevant interferences. In addition, centering more on the detection of difference features, a reliable architecture is designed by spotlighting two concerns, i.e., the difference acquisition using subtraction and concatenation as well as the multilevel difference aggregation using incremental feature alignment. Based on a naive backbone without sophisticated structures, i.e., ResNet18, our model outperforms other state-of-the-art (SOTA) methods on four CD datasets, especially in cases with rarely samples. Moreover, the achievement is attained with light overheads.
Read full abstract