During the initial stages of engineering projects management, the accurate estimation of equipment costs plays a crucial role in determining project approval decisions. However, there is a significant research gap regarding the estimation of investment costs for water reuse equipment. Additionally, cost predictions for such construction projects often suffer from limitations, including low accuracy, limited generalizability, and inefficiency. Advanced machine learning (ML) methods, renowned for their ability to model complex decision-making processes, offer powerful solutions. In this study, four traditional models and four ensemble models were employed to predict the cost of water reuse equipment. The results demonstrated that the ensemble models exhibited significantly superior predictive performance to that of traditional models with the three boosting ensemble models achieving the best performance (traditional models: training R2 = 29.77 %–94.85 %, testing R2 = 30.62 %–71.72 %; boosting ensemble models: training R2 = 97.42 %, testing R2 = 82.16 %–93.79 %). Furthermore, this study simplified the features of predictive models and identified the key variables that influence the cost of water reuse equipment using Shapley additive explanations (SHAP) method. The retrained ensemble models re-constructed based on the selected variables achieved significant predictive performance, with the Gradient Boosting Decision Tree (GBDT) outperforming the other models (training set R2 = 97.37 %, testing set R2 = 93.86 %). The water quantity, inflow conductivity, outflow conductivity, and recovery rate emerged as critical factors affecting the cost of water reuse equipment. Overall, the methods proposed in this study can enhance the versatility of cost prediction processes in environmental engineering scenarios, particularly those concerning the construction costs of water treatment equipment.
Read full abstract