Urban expansion trends and their relationship with flood susceptibility during the period 2014–2024 in Hanoi City
Over the past few decades, urban expansion has accelerated worldwide. This process can increase future flood risks due to local changes in hydrological conditions and the increased exposure and vulnerability of communities in flood-prone areas. Therefore, assessing the impact of urban expansion on flood susceptibility is an important task that can support local authorities in urban planning and in mitigating flood impacts. The objective of this study was to assess the impact of urban expansion on flood susceptibility in Hanoi using machine learning models: Deep Neural Networks (DNN), Adaptive Boosting (ADB), Extreme Gradient Boosting (XGB), and Random Forest (RF). A total of 1058 flood points and 14 conditioning factors corresponding to 2014 and 2024 were used as input to the models. Statistical indices, including Root Mean Square Error (RMSE), Mean Absolute Error (MAE), Area Under the Curve (AUC), and Coefficient of Determination (R2) were used to evaluate the performance of the proposed model. The results showed that the DNN model achieved the highest performance in assessing the impact of urban expansion on flood susceptibility (AUC=0.92), followed by XGB (0.91), ADB (0.86), and RF (0.82). During 2014–2024, urban expansion combined with the impacts of climate change has significantly increased the areas susceptible to flooding. In Hanoi, areas in the "high" and "very high" flood-susceptibility categories have been expanding continuously, accounting for about 25% of the total study area. In contrast, the "medium" group has a slight decreasing trend, while the "low" and "very low" areas have narrowed. This shows that urban expansion is increasing the area prone to flooding. The results of this study provide a solid scientific basis, supporting planners and policymakers in identifying limitations in current flood risk adaptation measures and in developing more appropriate spatial and temporal strategies to minimize flood impacts.
- Research Article
22
- 10.2166/wpt.2023.088
- Jun 1, 2023
- Water Practice & Technology
Flood damage is becoming increasingly severe in the context of climate change and changes in land use. Assessing the effects of these changes on floods is important, to help decision-makers and local authorities understand the causes of worsening floods and propose appropriate measures. The objective of this study was to evaluate the effects of climate and land use change on flood susceptibility in Thua Thien Hue province, Vietnam, using machine learning techniques (support vector machine (SVM) and random forest (RF)) and remote sensing. The machine learning models used a flood inventory including 1,864 flood locations and 11 conditional factors in 2017 and 2021, as the input data. The predictive capacity of the proposed models was assessed using the area under the curve (AUC), the root mean square error (RMSE), and the mean absolute error (MAE). Both proposed models were successful, with AUC values exceeding 0.95 in predicting the effects of climate and land use change on flood susceptibility. The RF model, with AUC = 0.98, outperformed the SVM model (AUC = 0.97). The areas most susceptible to flooding increased between 2017 and 2021 due to increased built-up area.
- Research Article
9
- 10.1007/s11069-022-05584-5
- Sep 5, 2022
- Natural Hazards
Wadi El-Matulla, located in the eastern desert of Egypt, is the most important water basin. The Qift–Qusayr highway (west–east direction) and the Cairo–Aswan eastern desert highway (north–south direction) pass through the watershed. Many urban areas (villages and industrial areas) and agricultural lands are located at the outlet of these basins. In addition, the basin has promising potential for future economic and urban development as it is located within the Golden Triangle (governmental megaproject). The current study investigates flood hazard modeling and its impact on the area. To determine the optimal flood susceptibility mapping algorithm, performance comparisons of three techniques were conducted: logistic regression (LR), extreme gradient boosting (EGB), and random forest (RF). Remote sensing, topographic, geologic, and meteorological data were used with the help of field visits to provide the spatial and inventory database required by the models. The performance and reliability of the predictions of the proposed models were evaluated using five statistical indices: receiver operating characteristic–area under the curve, overall accuracy (OAC), kappa index, root mean square error (RMSE), and mean absolute error (MAE). The performance of the models showed that the values of ROC (93, 86 and 80%), OAC (88, 82 and 76%), kappa index (0.85, 0.75 and 0.51), RMSE (0.34, 0.42 and 0.49) and MAE (0.12, 0.18 and 0.24) for RF, EGB, and LR, respectively. Based on AUC values, RF and EGB models provide excellent and very good prediction for flood susceptibility. Our results show that RF is the optimal algorithm for flood susceptibility mapping, followed by EGB and LR. Consequently, the predictive power of RF model is quite good and the flood susceptibility map was classified into five classes, namely very low (51.7%), low (23.7%), moderate (16.2%), high (7.1%), and very high (1.3%). Ultimately, the RF model was verified using sentinel-1 imagery for real floods in 2016 and 2021, and it provides good agreement. The optimal model could be useful for decision makers and planners to protect existing facilities and plan future projects in non-flood-prone areas. Accordingly, the most suitable areas for future development need to be distributed mainly in the low and very low flood hazard areas.
- Research Article
- 10.15625/2615-9783/22711
- Apr 16, 2025
- Vietnam Journal of Earth Sciences
The Mekong Basin is the most critical transboundary river basin in Asia. This basin provides an abundant source of fresh water essential for the development of agriculture, domestic consumption, and industry, as well as for the production of hydroelectricity, and it also contributes to ensuring food security worldwide. This region is often subject to floods that cause significant damage to human life, society, and the economy. However, flood risk management challenges in this region are increasingly substantial due to conflicting objectives between several countries and data sharing. This study integrates deep learning with optimization algorithms, namely Grasshopper Optimisation Algorithm (GOA), Adam and Stochastic Gradient Descent (SGD), and open-source datasets to identify the region of probably occurring floods in the Mekong basin, covering Vietnam and Cambodia. Various statistical indices, namely Area Under the Curve (AUC), root mean square error (RMSE), mean absolute error (MAE), and coefficient of determination (R²), were used to evaluate flood susceptibility models. The results show that the proposed models performed well with AUC values above 0.8, specifying that the DNN-Adam model achieved an AUC of 0.98, outperforming DNN-GOA (AUC = 0.89), DNN-SGD (AUC = 0.87), and XGB (AUC = 0.82. Regions with very high flood susceptibility are concentrated in the Mekong Delta of Vietnam and along the Mekong River in Cambodia. The findings of this study are significant in supporting decision-makers or planners in proposing appropriate flood mitigation strategies, planning policies, and strategies, particularly in the Mekong River watershed.
- Research Article
58
- 10.3390/rs16050858
- Feb 29, 2024
- Remote Sensing
Flood susceptibility mapping plays a crucial role in flood risk assessment and management. Accurate identification of areas prone to flooding is essential for implementing effective mitigation measures and informing decision-making processes. In this regard, the present study used high-resolution remote sensing products, i.e., synthetic aperture radar (SAR) images for flood inventory preparation and integrated four machine learning models (Random Forest: RF, Classification and Regression Trees: CART, Support Vector Machine: SVM, and Extreme Gradient Boosting: XGBoost) to predict flood susceptibility in Metlili watershed, Morocco. Initially, 12 independent variables (elevation, slope angle, aspect, plan curvature, topographic wetness index, stream power index, distance from streams, distance from roads, lithology, rainfall, land use/land cover, and normalized vegetation index) were used as conditioning factors. The flood inventory dataset was divided into 70% and 30% for training and validation purposes using a popular library, scikit-learn (i.e., train_test_split) in Python programming language. Additionally, the area under the curve (AUC) was used to evaluate the performance of the models. The accuracy assessment results showed that RF, CART, SVM, and XGBoost models predicted flood susceptibility with AUC values of 0.807, 0.780, 0.756, and 0.727, respectively. However, the RF model performed better at flood susceptibility prediction compared to the other models applied. As per this model, 22.49%, 16.02%, 12.67%, 18.10%, and 31.70% areas of the watershed are estimated as being very low, low, moderate, high, and very highly susceptible to flooding, respectively. Therefore, this study showed that the integration of machine learning models with radar data could have promising results in predicting flood susceptibility in the study area and other similar environments.
- Research Article
48
- 10.2166/wcc.2022.435
- Jun 1, 2022
- Journal of Water and Climate Change
Due to the physical processes of floods, the use of data-driven machine learning (ML) models is a cost-efficient approach to flood modeling. The innovation of the current study revolves around the development of tree-based ML models, including Rotation Forest (ROF), Alternating Decision Tree (ADTree), and Random Forest (RF) via binary particle swarm optimization (BPSO), to estimate flood susceptibility in the Maneh and Samalqan watershed, Iran. Therefore, to implement the models, 370 flood-prone locations in the case study were identified (2016–2019). In addition, 20 hydrogeological, topographical, geological, and environmental criteria affecting flood occurrence in the study area were extracted to predict flood susceptibility. The area under the curve (AUC) and a variety of other statistical indicators were used to evaluate the performances of the models. The results showed that the RF-BPSO (AUC=0.935) has the highest accuracy compared to ROF-BPSO (AUC=0.904), and ADTree-BPSO (AUC=0.923). In addition, the findings illustrated that the chance of flooding in the center of the area in question is greater than in other points due to lower elevation, lower slope, and proximity to rivers. Therefore, the ensemble framework proposed here can also be used to predict flood susceptibility maps in other regions with similar geo-environmental characteristics for flood management and prevention.
- Research Article
22
- 10.1016/j.jenvman.2023.118790
- Aug 28, 2023
- Journal of Environmental Management
A new approach based on biology-inspired metaheuristic algorithms in combination with random forest to enhance the flood susceptibility mapping
- Research Article
24
- 10.1038/s41598-025-97258-y
- Apr 15, 2025
- Scientific Reports
The current study investigates the application of artificial intelligence (AI) techniques, including machine learning (ML) and deep learning (DL), in predicting the ultimate load-carrying capacity and ultimate strain ofboth hollow and solid hybrid elliptical fiber-reinforced polymer (FRP)–concrete–steel double-skin tubular columns (DSTCs) under axial loading. Implemented AI techniques include five ML models — Gene Expression Programming (GEP), Artificial Neural Network (ANN), Random Forest (RF), Adaptive Boosting (ADB), and eXtreme Gradient Boosting (XGBoost) — and one DL model — Deep Neural Network (DNN).Due to the scarcity of experimental data on hybrid elliptical DSTCs, an accurate finite element (FE) model was developed to provide additional numerical insights. The reliability of the proposed nonlinear FE model was validated against existing experimental results. The validated model was then employed in a parametric study to generate 112 data points.The parametric study examined the impact of concrete strength, the cross-sectional size of the inner steel tube, and FRP thickness on the ultimate load-carrying capacity and ultimate strain of both hollow and solid hybrid elliptical DSTCs.The effectiveness of the AI application was assessed by comparing the models’ predictions with FE results.Among the models, XGBoost and RF achieved the best performance in both training and testing with respect to the determination coefficient (R2), Root Mean Square Error (RMSE), and Mean Absolute Error (MAE) values. The study provided insights into the contributions of individual features to predictions using the SHapley Additive exPlanations (SHAP) approach. The results from SHAP, based on the best prediction performance of the XGBoost model, indicate that the area of the concrete core has the most significant effect on the load-carrying capacity of hybrid elliptical DSTCs, followed by the unconfined concrete strength and the total thickness of FRP multiplied by its elastic modulus. Additionally, a user interface platform was developed to streamline the practical application of the proposed AI models in predicting the axial capacity of DSTCs.
- Research Article
74
- 10.1371/journal.pone.0317619
- Jan 23, 2025
- PloS one
This study presents a comprehensive comparative analysis of Machine Learning (ML) and Deep Learning (DL) models for predicting Wind Turbine (WT) power output based on environmental variables such as temperature, humidity, wind speed, and wind direction. Along with Artificial Neural Network (ANN), Long Short-Term Memory (LSTM), Recurrent Neural Network (RNN), and Convolutional Neural Network (CNN), the following ML models were looked at: Linear Regression (LR), Support Vector Regressor (SVR), Random Forest (RF), Extra Trees (ET), Adaptive Boosting (AdaBoost), Categorical Boosting (CatBoost), Extreme Gradient Boosting (XGBoost), and Light Gradient Boosting Machine (LightGBM). Using a dataset of 40,000 observations, the models were assessed based on R-squared, Mean Absolute Error (MAE), and Root Mean Square Error (RMSE). ET achieved the highest performance among ML models, with an R-squared value of 0.7231 and a RMSE of 0.1512. Among DL models, ANN demonstrated the best performance, achieving an R-squared value of 0.7248 and a RMSE of 0.1516. The results show that DL models, especially ANN, did slightly better than the best ML models. This means that they are better at modeling non-linear dependencies in multivariate data. Preprocessing techniques, including feature scaling and parameter tuning, improved model performance by enhancing data consistency and optimizing hyperparameters. When compared to previous benchmarks, the performance of both ANN and ET demonstrates significant predictive accuracy gains in WT power output forecasting. This study's novelty lies in directly comparing a diverse range of ML and DL algorithms while highlighting the potential of advanced computational approaches for renewable energy optimization.
- Research Article
31
- 10.1016/j.compag.2023.108140
- Aug 15, 2023
- Computers and Electronics in Agriculture
Exploring interpretable and non-interpretable machine learning models for estimating winter wheat evapotranspiration using particle swarm optimization with limited climatic data
- Research Article
40
- 10.1186/s12302-025-01078-w
- Mar 3, 2025
- Environmental Sciences Europe
The pollution in Dhaka's navigable waterways, including the Buriganga, Balu, Tongi Khal, and Turag rivers, is a significant concern due to rapid industrial and urban expansion. Industrial discharges, domestic sewage and inadequate waste management are the primary sources of this pollution, degrading water quality and threatening aquatic ecosystems. This study aimed to predict the Water Quality Index (WQI) of these rivers using fourteen machine learning (ML) models: Decision Tree Regression, Linear Regression, Ridge Regression, Stochastic Gradient Descent (SGD) Regressor, Extreme Gradient Boosting (XGB) Regressor, Light Gradient Boosting Machine (GBM) Regressor, Elastic Net Regressor, Support Vector Regression (SVM), Random Forest Regression, Bayesian Ridge Regressor, Artificial Neural Network (ANN), AdaBoost Regressor, CatBoost Regressor and Extra Trees Regressor. The objective was to evaluate and compare these models to identify the most effective predictive method for WQI, enabling efficient environmental monitoring and management of urban waterways. Among the evaluated ML models, ANN and Random Forest Regressor performed the best. The ANN model demonstrated superior predictive capability, achieving a Root Mean Squared Error (RMSE) of 2.34, a Mean Absolute Error (MAE) of 1.24, a Nash–Sutcliffe Efficiency (NSE) of 0.97, and a Coefficient of Determination (R2) of 0.97. Furthermore, an Adjusted R2 value of 0.965 further confirmed its ability to capture complex patterns in water quality data with remarkable accuracy. These findings emphasize the importance of using AI modeling techniques, specifically ANN and Random Forest Regression, to improve the accuracy of WQI forecasts for the waterways. This study contributes to the field of environmental science by offering a novel integration of feature selection techniques with ML models to enhance efficiency and cost-effectiveness of water quality monitoring. Unlike previous studies, this research specifically addresses the challenges of urban waterways in Dhaka, Bangladesh, a region significantly impacted by industrial and urban pollution. To our knowledge, this is the first study to apply such a comprehensive range of ML models to predict the WQI of Dhaka’s four major rivers. By providing a reliable methodology for WQI estimation, this study supports informed decision-making and proactive measures to protect vital water resources.
- Research Article
19
- 10.1016/j.engfailanal.2023.107864
- Dec 9, 2023
- Engineering Failure Analysis
Modeling of necking area reduction of carbon steel in hydrogen environment using machine learning approach
- Research Article
28
- 10.1016/j.cscm.2024.e03092
- Mar 27, 2024
- Case Studies in Construction Materials
A comprehensive comparison of various machine learning algorithms used for predicting the splitting tensile strength of steel fiber-reinforced concrete
- Research Article
82
- 10.3390/rs12203423
- Oct 18, 2020
- Remote Sensing
The uncertainty of flash flood makes them highly difficult to predict through conventional models. The physical hydrologic models of flash flood prediction of any large area is very difficult to compute as it requires lot of data and time. Therefore remote sensing data based models (from statistical to machine learning) have become highly popular due to open data access and lesser prediction times. There is a continuous effort to improve the prediction accuracy of these models through introducing new methods. This study is focused on flash flood modeling through novel hybrid machine learning models, which can improve the prediction accuracy. The hybrid machine learning ensemble approaches that combine the three meta-classifiers (Real AdaBoost, Random Subspace, and MultiBoosting) with J48 (a tree-based algorithm that can be used to evaluate the behavior of the attribute vector for any defined number of instances) were used in the Gorganroud River Basin of Iran to assess flood susceptibility (FS). A total of 426 flood positions as dependent variables and a total of 14 flood conditioning factors (FCFs) as independent variables were used to model the FS. Several threshold-dependent and independent statistical tests were applied to verify the performance and predictive capability of these machine learning models, such as the receiver operating characteristic (ROC) curve of the success rate curve (SRC) and prediction rate curve (PRC), efficiency (E), root-mean square-error (RMSE), and true skill statistics (TSS). The valuation of the FCFs was done using AdaBoost, frequency ratio (FR), and Boosted Regression Tree (BRT) models. In the flooding of the study area, altitude, land use/land cover (LU/LC), distance to stream, normalized differential vegetation index (NDVI), and rainfall played important roles. The Random Subspace J48 (RSJ48) ensemble method with an area under the curve (AUC) of 0.931 (SRC), 0.951 (PRC), E of 0.89, sensitivity of 0.87, and TSS of 0.78, has become the most effective ensemble in predicting the FS. The FR technique also showed good performance and reliability for all models. Map removal sensitivity analysis (MRSA) revealed that the FS maps have the highest sensitivity to elevation. Based on the findings of the validation methods, the FS maps prepared using the machine learning ensemble techniques have high robustness and can be used to advise flood management initiatives in flood-prone areas.
- Research Article
- 10.3390/rs18081158
- Apr 13, 2026
- Remote Sensing
Every year, floods disrupt the lives of hundreds of millions of people worldwide. Their impacts are further intensified by climate change, rapid urbanization, and land-use changes, making it crucial to identify areas most susceptible to flooding. While machine learning (ML) models have proven effective in identifying flood susceptibility, their validity and the integration of human risk remain underexplored in geomorphologically complex and highly flood-prone regions. This study developed an ensemble ML framework for flood susceptibility mapping in the Kosi Megafan, located in Nepal and India. We compared its performance with established ML models and a one-dimensional convolutional neural network (1D-CNN), validated results using Dartmouth Flood Observatory (DFO) and Sentinel-1 SAR (Synthetic Aperture Radar) data, and assessed the population exposed to high-risk zones. A total of 13 (8 retained) flood conditioning factors (FCFs) were derived from remote sensing datasets, and a flood inventory was created to train multiple ML models, including Random Forest (RF), Extreme Gradient Boosting (XGBoost), Adaptive Boosting (AdaBoost), Support Vector Machine (SVM), 1D-CNN, and a Stacked Ensemble model. Among these, the stacked ensemble model achieved the highest performance (AUC = 0.76, accuracy = 0.70, precision = 0.69, recall = 0.72, F1-score = 0.70). The resulting susceptibility map identified high-risk zones mainly in the southern and southwestern Megafan, showing strong spatial agreement with the Sentinel-1-derived flood inventory and the DFO flood data (1992–2022). This study highlights the effectiveness of combining SAR-derived flood evidence with ensemble ML approaches for accurate and scalable flood susceptibility mapping in data-scarce, hazard-prone basins. Ultimately, the research supports efforts to build resilience and mitigate the long-term impact of flooding in the region.
- Research Article
7
- 10.2166/wcc.2024.035
- Jul 22, 2024
- Journal of Water and Climate Change
The objective of this study was the development of a new machine learning model using a radial basis function neural network (RBFNN) to build flood susceptibility maps and damage assessment for the Phu Yen province of Vietnam. The built model will be optimized by five algorithms, namely Giant Trevally Optimization (GTO), Golden Jackal Optimization (GJO), Brown-Bear Optimization (BBO), Gray Wolf Optimizer (GWO), and Whale Optimization Algorithm (WOA) to find out the best model to establish the flood susceptibility map. These models were evaluated using the statistical indices such as root mean square error (RMSE), mean absolute error (MAE), receiver operating characteristic (ROC), area under the curve (AUC), and coefficient of determination (COD). The result showed that all five optimization algorithms were successfully improving the performance of the RBFNN model, among them the hybrid model RBFNN–BBO has the highest performance with AUC = 0.998 and R2 = 0.8 and the RBFNN–GTO model has the lowest performance with AUC = 0.755 and R2 = 0.65. The regions identified with a high- and very-high flood susceptibility area (1,075 km2) were concentrated on the plain and along three of the largest rivers in Phu Yen province.