Estimating the grade of storm surge disaster loss in coastal areas of China via machine learning algorithms

Xiaomin Li,Jie Zhang,Xuexue Du,Qi Hou,Suming Zhang,Xifang Jin,Tangqi Zhao

doi:10.1016/j.ecolind.2022.108533

Abstract

Storm surge is the most severe marine disaster in China, affecting the whole coastal area. Estimating storm surge disaster loss (SSDL) is significant to disaster prevention, sustainability and decision-making. Taking 11 provincial administrative regions in the coastal areas of China as the study area, this paper estimated SSDL grades based on four machine learning (ML) algorithms. A total of 132 pieces of official open-source data of storm surge disasters were collected and divided into a cross-validation set (CV set) and a test set. First, a comprehensive indicator system was constructed from three perspectives, covering the hazard (16) of disaster-causing factors, the vulnerability (22) and resilience (12) of disaster-bearing bodies, including 50 indicators. A few data preprocessing methods are implemented to improve the model performance such as normalization, SMOTE, etc. Then, Random Forest (RF), eXtreme Gradient Boosting (XGBoost), Logistic model tree (LMT), and K-star were applied to construct the estimation model of SSDL grades. Principal component analysis (PCA) and recursive feature elimination (RFE) are adopted for an intelligent screening of the indicators. Finally, the models’ performance is compared through Precision, Recall, F1 score and Kappa metrics. The results show that scientific and efficient data preparation is a strong guarantee for the reliability and stability of the models. RFE is verified more suitable for indicator selection in this paper compared with PCA. The importance ranking of RFE enhances the interpretability of the ML model, which shows that the hazard indicator is the most important, the vulnerability indicator is the second, and the resilience indicator is the least. The 27-indicator K-star model, with advantages of accurate estimation, strong generalization, and less workload, is the optimal SSDL estimation model. The number of input indicators of the optimal SSDL estimation model is 27, its CV Precision, Recall, F1 score, and Kappa are 0.838, 0.832, 0.827, and 0.776, and its Precision, Recall, F1 score, and Kappa for test set are 0.819, 0.786, 0.781, and 0.714, respectively. This paper provides a scientific basis for the government's decision-making and risk management, and it can be used as a typical demonstration case of SSDL research.

Full Text