The availability of real-time measurements of primary quality water variables is one of the key challenges in the wastewater treatment industry. However, due to the cost and maintenance requirements of sensors and probes for on-line measurement of primary quality variables, the prediction of these variables via data-driven approaches using as inputs easy-to-measure process variables has attracted research interest. In this paper, different machine learning techniques: feed-forward artificial neural network, random forest, support vector machine, gaussian process regression and partial least squares were used to predict in real-time the total ammonium nitrogen concentration during the operation of a hollow fibre membrane contactor. This recently developed technology allows the recovery of nitrogen from nitrogen rich streams (i.e. supernatant of anaerobic digesters in wastewater treatment plants) as ammonium sulphate (a marketable fertilizer). These contactors are usually operated in batch mode, pumping the high nitrogen concentration feed from the storage tank, where the total ammonium nitrogen concentration decreases progressively as the fertilizer is produced. Knowing the real-time concentration of total ammonium nitrogen in the storage tank would enable the optimization of the process operation, avoiding its operation with conservative fixed-time batch duration. The pH is an easy-to-measure process variable usually available in wastewater treatment plants that was used as input of the tested data-driven models, together with two extracted features from this variable (its derivative and increments after each reagent dosing). The number of total ammonium nitrogen measurements in the collected database is 2350 data points (corresponding to 8 complete batches, which were divided into 6 for training the data-driven models and 2 for testing them), ranging from 987 to 2.5 mg NH4+-N/L which covers almost the complete range of total ammonium nitrogen concentration values in the membrane contactor. The predictive ability of the developed predictive models was evaluated on the test data set by four indices, namely: the root-mean-square error, the slope and the intercept of the linear fit between the measured and predicted concentrations and the determination coefficient. The results showed a strong predictive ability of the fitted ANN that outperformed the other approaches exhibiting a determination coefficient of 0.99 and the lowest root-mean-square error (19.87 mg/L) in the test set. Permutation variable importance demonstrated that all machine learning techniques depended mainly on the two variables extracted from the pH: its derivative and increments, which resulted to be more important than the pH itself to predict the total ammonium nitrogen concentration.
Read full abstract