Abstract

Timely acquiring the earthquake-induced damage of buildings is crucial for emergency assessment and post-disaster rescue. Optical remote sensing is a typical method for obtaining seismic data due to its wide coverage and fast response speed. Convolutional neural networks (CNNs) are widely applied for remote sensing image recognition. However, insufficient extraction and expression ability of global correlations between local image patches limit the performance of dense building segmentation. This paper proposes an improved Swin Transformer to segment dense urban buildings from remote sensing images with complex backgrounds. The original Swin Transformer is used as a backbone of the encoder, and a convolutional block attention module is employed in the linear embedding and patch merging stages to focus on significant features. Hierarchical feature maps are then fused to strengthen the feature extraction process and fed into the UPerNet (as the decoder) to obtain the final segmentation map. Collapsed and non-collapsed buildings are labeled from remote sensing images of the Yushu and Beichuan earthquakes. Data augmentations of horizontal and vertical flipping, brightness adjustment, uniform fogging, and non-uniform fogging are performed to simulate actual situations. The effectiveness and superiority of the proposed method over the original Swin Transformer and several mature CNN-based segmentation models are validated by ablation experiments and comparative studies. The results show that the mean intersection-over-union of the improved Swin Transformer reaches 88.53%, achieving an improvement of 1.3% compared to the original model. The stability, robustness, and generalization ability of dense building recognition under complex weather disturbances are also validated.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call