Abstract

Controlling the formation of disinfection byproducts (DBPs) requires prior knowledge of DBP formation potential. Mathematical models can accurately predict the formation of DBPs and have the advantage of reducing laboratory tests and related costs. Researchers continue to develop new models for specific regions but rarely used external data sets to evaluate the predictive ability of previous models. Most of the models focus on total trihalomethanes (THMs), and the predictive models for emerging DBPs (e.g., chloral hydrate (CH)) are lacking. Moreover, little discussion is available on comparing linear and machine learning (ML) algorithms in predicting the formation of DBPs. This study investigated the predictive models of CH, chloroform, THMs, dichloroacetic acid, trichloroacetic acid, and haloacetic acids based on stepwise multiple linear regression and ML regression using easily monitored water quality parameters (i.e., pH, UV254, and total organic carbon (TOC)). Among these parameters, UV254 is the dominant parameter in predicting the formation of target DBPs and deserves more attention in future studies. Among the models for the target DBPs, the model for CH using stepwise multiple linear regression was shown as follows: LnCH = 8.945 + 0.558 × Ln(UV254) – 2.37 × Ln(pH) + 0.152 × Ln(TOC). The support vector regression (MAPE = 2.578–5.798%, R2 = 0.665–0.802) and random forest regression (MAPE = 2.867–5.346%, R2 = 0.671–0.965) performed better than traditional stepwise linear regression (MAPE = 2.857–6.671%, R2 = 0.602–0.770) in the training and testing set. This emphasized that ML algorithms were viable alternatives to conventional linear regression in the management of DBPs.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call