Integrating segmentation and vision-language model for automated and interpretable building damage assessment from satellite imagery
Integrating segmentation and vision-language model for automated and interpretable building damage assessment from satellite imagery
- Preprint Article
1
- 10.5194/egusphere-egu23-5778
- May 15, 2023
Natural and man-made disasters pose a threat to human life, flora-fauna, and infrastructure. It is critical to detect the damage quickly and accurately for infrastructures right after the occurrence of any disaster. The detection and assessment of infrastructure damage help manage financial strategy as well. Recently, many researchers and agencies have made efforts to create high-resolution satellite imageries database related to pre and post-disaster events. The advanced remote sensing satellite imageries can reflect the surface of the earth accurately up to 30 cm spatial resolution on a daily basis. These high spatial resolutions (HSR) imageries can help access any natural hazard's damage by comparing the pre- and post-disaster data. These remote sensing imageries have limitations, such as cloud occlusions. Building under a thick cloud cannot be recognised in optical images. The manual assessment of the severity of damage to buildings/infrastructure by comparing bi-temporal HSR imageries or airborne will be a tedious and subjective job. On the other hand, the emerging use of unmanned aired vehicles (UAV) can be used to assess the situation precisely. The high-resolution UAV imageries and the HSR satellite imageries can complement each other for critical infrastructure damage assessment. In this study, a novel approach is used to integrate UAV data into HSR satellite imageries for the building damage assessment using a convolution neural network (CNN) based deep learning model. The research work is divided into two fundamental sub-tasks: first is the building localisation in the pre-event images, and second is the damage classification by assigning a unique damage level label reflecting the degree of damage to each building instance on the post-disaster images. For the study, the HSR satellite imageries of 36 pairs of pre- and post natural hazard events is acquired for the year 2021-22, similarly available UAV based data for these events is also collected from the open data source. The data is then pre-processed, and the building damage is assessed using a deep object-based semantic change detection framework (ChangeOS). The mentioned model was trained on the xview2 building damage assessment datasets comprised of ~20,000 images with ~730,000 building polygons of pre and post disaster events over the globe from 2011-2018. The experimental setup in this study includes training on the global dataset and testing on the regional-scale building damage assessment using HSR satellite imageries and local-scale using UAV imageries. The result obtained from the bi-temporal assessment of HSR images for the Indonesia Earthquake 2022 has shown an F1 score of ~67%, while the Uttarakhand flooding event 2021 has reported an F1 score of ~64%. The HSR imageries from the UAV Haiti earthquake event in 2011 have also shown less but promising F1 scores of ~54%. It is inferred that merging HSR imageries from satellite and UAV for building damage assessment using the ChangeOS framework represents a robust tool to further promote future research in infrastructure maintenance strategy and policy management in disaster response.
- Conference Article
126
- 10.1109/icpr48806.2021.9412295
- Jan 10, 2021
Accurate and fine-grained information about the extent of damage to buildings is essential for directing Humanitarian Aid and Disaster Response (HADR) operations in the immediate aftermath of any natural calamity. In recent years, satellite and UAV (drone) imagery has been used for this purpose, sometimes aided by computer vision algorithms. Existing Computer Vision approaches for building damage assessment typically rely on a two stage approach, consisting of building detection using an object detection model, followed by damage assessment through classification of the detected building tiles. These multi-stage methods are not end-to-end trainable, and suffer from poor overall results. We propose RescueNet, a unified model that can simultaneously segment buildings and assess the damage levels to individual buildings and can be trained end-to-end. In order to to model the composite nature of this problem, we propose a novel localization aware loss function, which consists of a Binary Cross Entropy loss for building segmentation, and a foreground only selective Categorical Cross-Entropy loss for damage classification, and show significant improvement over the widely used Cross-Entropy loss. RescueNet is tested on the large scale and diverse xBD dataset and achieves significantly better building segmentation and damage classification performance than previous methods and achieves generalization across varied geographical regions and disaster types.
- Video Transcripts
- 10.48448/gv4w-yz78
- Dec 29, 2020
- Underline Science Inc.
Accurate and fine-grained information about the extent of damage to buildings is essential for directing Humanitarian Aid and Disaster Response (HADR) operations in the immediate aftermath of any natural calamity. In recent years, satellite and UAV (drone) imagery has been used for this purpose, sometimes aided by computer vision algorithms. Existing Computer Vision approaches for building damage assessment typically rely on a two stage approach, consisting of building detection using an object detection model, followed by damage assessment through classification of the detected building tiles. These multi-stage methods are not end-to-end trainable, and suffer from poor overall results. We propose RescueNet, a unified model that can simultaneously segment buildings and assess the damage levels to individual buildings and can be trained end-toend. In order to to model the composite nature of this problem, we propose a novel localization aware loss function, which consists of a Binary Cross Entropy loss for building segmentation, and a foreground only selective Categorical Cross-Entropy loss for damage classification, and show significant improvement over the widely used Cross-Entropy loss. RescueNet is tested on the large scale and diverse xBD dataset and achieves significantly better building segmentation and damage classification performance than previous methods and achieves generalization across varied geographical regions and disaster types.
- Research Article
79
- 10.1111/mice.12981
- Feb 24, 2023
- Computer-Aided Civil and Infrastructure Engineering
Large‐scale building damage assessment using a novel hierarchical transformer architecture on satellite images
- Research Article
- 10.1155/ijae/5599522
- Jan 1, 2025
- International Journal of Aerospace Engineering
In the contemporary defense industry and the realm of air traffic safety, the identification of aircraft on land and in the air is of paramount importance. Contemporary radar systems have the capacity to track aircraft; however, these systems are inherently dependent on human intervention, thereby introducing a heightened risk of undesirable events. Image processing techniques have emerged as a pivotal component in the detection of aircraft. Specifically, methodologies such as image classification, object detection, and segmentation facilitate the precise detection and tracking of aircraft. However, for direct detection, segmentation models and object detection methods must be employed. In this study, aircraft segmentation and detection were performed using satellite imagery, with the U‐Net segmentation model and the YOLO object detection model being utilized. The dataset comprised a total of 103 satellite images, with each image containing one or more aircraft. Various performance metrics were obtained during the training and testing phases of the models. The highest validation IoU (Intersection over Union) of 61.3% and validation F 1 score of 85.1% were reported from the U‐Net segmentation model, while an F 1 score of 79.8% and a mAP (mean average precision) of 77.7% were obtained from the YOLOv5‐m object detection model.
- Research Article
172
- 10.1193/1.1650865
- Feb 1, 2004
- Earthquake Spectra
Newly available optical satellite images with 1‐m ground resolution such as IKONOS mean that rapid postdisaster damage assessment might be made over large areas. Such surveys could be of great value to emergency management and post‐event recovery operations and have particular promise for earthquake areas, where damage distribution is often very uneven. In this paper three satellite images taken before and after the 26 January 2001 Gujarat earthquake were studied for damage assessment purposes. The images comprised a post‐earthquake cover of the city of Bhuj, which was close to the epicenter, and pre‐ and post‐earthquake cover of the city Ahmedabad. The assessment data was then compared with damage surveys actually made on‐site. Three separate experiments were conducted. In the first, the satellite image of Bhuj was compared with detailed ground photos of 28 severely damaged buildings taken at about the same time as the satellite image, to investigate the levels and types of damage that can and cannot be identified. In the second experiment, the whole city center of Bhuj was damage mapped using only the satellite image. This was subsequently compared with a map produced from a building‐by‐building damage survey. In the third experiment, pre‐ and post‐earthquake images for a large area of Ahmedabad were compared and totally collapsed buildings were identified. These sites were subsequently visited to confirm the accuracy of the observations. The experiment results indicate that rapid visual screening can identify areas of heavy damage and individual collapsed buildings, even when comparative cover does not exist. The need to develop a tool with direct application to support emergency response is discussed.
- Conference Article
- 10.1109/sbr/wre66973.2025.11249658
- Oct 13, 2025
Establishing accurate correspondence between sonar and satellite images is a nontrivial task due to differences in modality, resolution, and environmental noise, especially in underwater scenarios with GPS-denied environments. This work investigates the integration of semantic segmentation for matching satellite and sonar images. We evaluated a diverse set of state-of-the-art architectures, including convolutional models (U-Net, U-Net++, FPN, PSPNet, LinkNet, and DeepLab v3+) and attention-based models (MA-Net and SegFormer), with a focus on their ability to capture local structures, multiscale features, and contextual dependencies relevant for robust cross-modal matching. Experimental results indicate that, while convolutional networks deliver efficient and accurate segmentation of salient structures, attention-based models improve matching performance in complex scenarios by modeling long-range spatial dependencies. Among the architectures evaluated, MA-Net achieves superior performance, with a pixel accuracy of 0.9519, a mean IoU of 0.9494, and a matching score of 0.3618, underscoring the effectiveness of attention mechanisms. These findings lay the groundwork for future research on unified segmentation and matching frameworks specifically designed for autonomous underwater navigation.
- Conference Article
1
- 10.1061/41171(401)202
- Apr 13, 2011
- Structures Congress 2011
This damage assessment relied heavily on the use of remote sensing technology. Never before has the availability of high-resolution satellite and aerial imagery been so open and accessible. Data from different missions (World Bank-ImageCat-RIT Remote Sensing Mission (15cm optical and 2 pt/m2 LiDAR), Google (15cm optical), NOAA (25cm optical), Pictometry, as well as satellite imagery from GeoEye and Digitalglobe) has allowed damage from the Haiti earthquake to be viewed through multiple sensors and at different times. These multi-dimensional perspectives have been invaluable in understanding the magnitude and scope of damage caused by this earthquake.
- Research Article
5
- 10.1371/journal.pone.0320452
- Mar 26, 2025
- PloS one
Archaeologists often use high-resolution satellite imagery to identify potential archaeological sites or features, including ancient settlements, burial mounds, roads, and even subtle differences in vegetation or topography. Over the last several decades, satellite imagery and other remote sensing techniques (including aerial photography and LiDAR data) have been used to thoroughly map the extensive settlement complex of the Greater Angkor Region (1 500 km2, 9th - 14th centuries CE) in present-day Cambodia. While we now have a comprehensive map of this area, the landscapes beyond the Greater Angkor Region that formed the Angkorian cultural sphere have not been mapped, even though the density of features on the landscape seems to continue beyond the area considered Greater Angkor. While a comprehensive settlement study of the entire Angkorian realm would be incredibly helpful in understanding patterns of ancient urbanism and early statehood in Southeast Asia, mapping this area using manual identification of archaeological features in satellite imagery would be highly time-consuming. In this paper, we employ a state-of-the-art deep learning model for semantic segmentation using Deeplab V3 + to identify one typical and characteristic feature: Angkor-period reservoirs. Our results indicate that this AI model is accurate enough to provide a valuable "second opinion" to landscape archaeologists to enhance and quicken their mapping process, making them substantially more productive. The deep learning model for semantic segmentation employed here, which can be trained on other types of archaeological and non-archaeological features worldwide, will be a valuable tool for areas of research that involve intensive manual investigation and interpretation of satellite imagery and will aid researchers as they continue to map the Angkorian world.
- Research Article
21
- 10.1109/jstars.2021.3123398
- Jan 1, 2021
- IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing
Urban buildings are essential components of cities and an indispensable source of urban geographic information. While there are many research efforts focused on urban buildings extraction, there are few studies on large-scale urban building mapping based on satellite images. In this research, a large-scale urban building mapping scheme based on Gaofen-2 satellite (GF-2) images is proposed based on a hierarchical approach. In this hierarchical approach, urban buildings are regarded as a mixture of dense low-rise buildings (DLB) and sparse independent buildings (SIB) stacked in space, which are extracted by a semantic segmentation model and an instance segmentation model, respectively. In this study, GF-2 images and OpenStreetMap data were used to extract DLB using U2-Net with focalloss. GF-2 images were used to extract SIB using an improved CenterMask model with a deformable convolution network and a spatial coordinate attention module. The main urban area within the 5th ring road of Beijing was selected as the study area. With the trained model, the GF-2 image tiles of Beijing input into the models to first derive coarse maps of DLB and SIB. Post-processing optimization was performed after combining the maps. The accuracy assessment shows that the overall accuracy of large-scale urban building mapping using the hierarchical approach proposed in this paper reaches 91.5%, which is 4.8% higher than with a traditional method. Overall, the hierarchical approach proposed in this paper is effective in large-scale urban building mapping and provides new application opportunities.
- Conference Article
2
- 10.1109/csde56538.2022.10089353
- Dec 18, 2022
During or after natural disasters, information about location, cause, and severity, is crucial for early responders to act accordingly. Building damage is one of the major disaster types that occurred repeatedly. Being able to estimate the extent and location of damaged buildings are important so that emergency personnel and rescue teams can expedite efforts to the right building in affected location. Satellite imagery is a powerful visual resource that can be used to assess the extent of damages within a wide geographical area. However, current post-disaster practice requires manual annotation of damaged buildings, which is labor intensive and time consuming. Resultantly, traditional damage detection methods have been outperformed in terms of accuracy by Deep Learning (DL) architectures such as the Convolutional Neural Networks (CNN). Therefore, we developed a novel framework named Multi-scale Siamese Building Damage Assessment Network (MSBDA-Net). The proposed framework includes a two-step approach. The first stage is building localization, which a mask of all buildings before disaster will be generated. The second stage is a multi-scale Siamese damage assessment model, where the network takes the image pairs contained pre- and post-disaster as input and classify building on different damage levels. The evaluation results of proposed method indicate the applicability of the proposed method in both building segmentation (Fl-score=86.3%) and damage assessment (Fl-score=78.44 %)
- Research Article
45
- 10.1117/1.jrs.11.046024
- Dec 18, 2017
- Journal of Applied Remote Sensing
The assessment of building damage following a natural disaster is a crucial step in determining the impact of the event itself and gauging reconstruction needs. Automatic methods for deriving damage maps from remotely sensed data are preferred, since they are regarded as being rapid and objective. We propose an algorithm for performing unsupervised building segmentation and damage assessment using airborne light detection and ranging (lidar) data. Local surface properties, including normal vectors and curvature, were used along with region growing to segment individual buildings in lidar point clouds. Damaged building candidates were identified based on rooftop inclination angle, and then damage was assessed using planarity and point height metrics. Validation of the building segmentation and damage assessment techniques were performed using airborne lidar data collected after the Haiti earthquake of 2010. Building segmentation and damage assessment accuracies of 93.8% and 78.9%, respectively, were obtained using lidar point clouds and expert damage assessments of 1953 buildings in heavily damaged regions. We believe this research presents an indication of the utility of airborne lidar remote sensing for increasing the efficiency and speed at which emergency response operations are performed.
- Conference Article
10
- 10.1109/dasa54658.2022.9765025
- Mar 23, 2022
Damage assessment is one reasonable method for adopting good procedures for obtaining speedy and dependable attention during natural calamities such as a hurricane. Lately, calamity researchers have often used satellite imagery to predict the number of damaged properties. It can detect the damaged structures in time by integrating satellite imagery and Convolutional Neural Network (CNN) transfer learning. Consequently, choosing the variables of transfer learning success in this scenario is demanded. To identify damaged structures post-hurricane, we introduce a technique based on VGG16 that utilizes satellite imagery features of the hurricane-affected region. The global average pooling, which is a layer substitutes the fully connected layer to minimize parameters and enhance convergence speed. The experimental outcome indicates which proposed model's overall accuracy for post-hurricane image classification can reach 0.98 per cent. Our proposed method approximates the classical CNN, VGG16, VGG19, AlexNet and surpasses their performance.
- Conference Article
4
- 10.1109/raics.2011.6069400
- Sep 1, 2011
In this paper, we proposed a novel global region based segmentation method for satellite and medical images with geometric active contour model and level set evolution on noisy images with salt and pepper. The active contour or snake model is one of the most successful variational models in image segmentation. It has been widely used to locate boundaries of image segmentation and computer vision. Problem associated with the existence of the local minima in the active contour energy function makes snakes have poor convergence in segmentation process; therefore, the poor convergence has limited applications. In this work, a fast minimization of snake model is used for satellite and medical image segmentation on noisy images with ten percentage of Noisy was added. This method provides a satisfied result. As a result, it is a good candidate for medical image segmentation approach. Experiments on satellite images with noise demonstrate the advantages of the proposed method over the Chan-Vase (CV) active contour in terms of the number of Iterations and time complexity are less because it uses isotropic schemes to regularize the contour and is sub-pixel precise. Finally, the Memory requirement is low.
- Research Article
9
- 10.3390/electronics12040896
- Feb 9, 2023
- Electronics
Federated deep learning frameworks can be used strategically to monitor land use locally and infer environmental impacts globally. Distributed data from across the world would be needed to build a global model for land use classification. The need for a federated approach in this application domain would be to avoid the transfer of data from distributed locations and save network bandwidth to reduce communication costs. We used a federated UNet model for the semantic segmentation of satellite and street view images. The novelty of the proposed architecture involves the integration of knowledge distillation to reduce communication costs and response times. The accuracy obtained was above 95% and we also brought in a significant model compression to over 17 times and 62 times for street-view and satellite images, respectively. Our proposed framework has the potential to significantly improve the efficiency and privacy of real-time tracking of climate change across the planet.