Continuously, cost overruns in construction projects, as a leading cause of project failure, have been attracting more and more attention among construction stakeholders. Notably, cost overrun prediction model development can help identify factors that lead to cost overruns, thereby substantially improving cost estimates. Meanwhile, a machine learning application on archival data to estimate construction cost overrun is still in development. Motivated by this, we applied an Extreme Gradient Boosting (XGBoost) machine to analyze historical data of construction projects in Ghana completed between 2016 and 2018. The comparison between the actual and predicted cost yielded a good model prediction. The RMSE, MSE, MAE, and MAPE values are 0.202, 0.041, 0.069, and 0.306, respectively. To visually explain the importance of each feature for cost overrun prediction, we used SHAP values to illustrate the effect of each feature for model interpretability. According to SHAP ranking, we discover that the initial contract amount, the number of storeys, scope changes, and the initial duration are the variables that most accurately predict project completion costs and cost overruns. This research explores an innovative way to understand and evaluate essential variables that can help develop a prediction model of cost overruns that could aid the construction industry’s cost estimation.
Read full abstract