Multiple-layer statistical methodology for developing data-driven models of anaerobic digestion process

Moonil Kim,Fenghao Cui

doi:10.1016/j.jenvman.2023.119153

Abstract

When modelling anaerobic digestion, ineffective data handling and inadequate designation of modelling parameters can undermine the model reliability. In this study, a multilayer statistical technique, which employed a machine learning technique using regression models, was introduced to systematically support the development of anaerobic digestion models. Layer-by-layer statistical techniques including cubic smoothing splines (missing data reconstruction), principal component analysis (identifying correlated parameters), analysis of variance (analysing differences among datasets), and linear regression (developing data-driven models) were used to develop and validate anaerobic digestion models. Experimental data collected from the long-term operation of lab-scale (operated for 350 days), pilot-scale (operated for 150 days), and full-scale reactors (operated for 750 days) were used to demonstrate the modelling process. The multivariate models based on a data-driven modelling technique were developed by subjecting the experimental and monitored data to a modelling process. The developed models could predict the biogas production and effluent chemical oxygen demand during anaerobic digestion. Statistical analyses verified the modelling hypotheses, evaded invalid model development, and ensured data integrity and parameter validity. Multiple linear regression of principal components demonstrated that the performance of biogas production using food waste was influenced by the variances of the nitrogen and organic concentrations, but not by the chemical oxygen demand to total nitrogen (C/N) ratio. In the validation process, the model developed with lab-scale reactor data showed relatively high accuracy with R2, SSE, and RMSE values of 0.86, 34.45, and 0.72.

Full Text