Causal inference-assisted machine learning was used to predict photosynthetic bacterial (PSB) protein production capacity and identify key factors. The extreme gradient boosting algorithm effectively predicted protein content, while the gradient boosting decision tree algorithm excelled in predicting protein production, protein productivity, and protein energy yields. Driving factors were identified, with suitable ranges: protein content (pH 6.0–7.5, hydraulic retention time (HRT) < 3.8 d), protein production (biomass > 1.7 g, organic loading rate (OLR) > 9.2 gL–1d–1, temperature 26.7–35.0 °C), protein productivity (HRT < 3.5 d, biomass > 1.6 g, OLR > 10.0 gL–1d–1), and protein energy yields (light energy 0.1–4.4 kWh, biomass 1.7–65.0 g, chemical oxygen demand (COD) 0.1–2.5 gL–1). Illuminance, dissolved oxygen, COD, and COD/total nitrogen ratio were causal factors influencing protein production. Two-dimensional partial dependence plot revealed the interaction between two driving factors. This study enhances information on PSB protein production and offers insights for wastewater treatment and sustainable resource development.
Read full abstract