Anaerobic biodegradation rates (half-lives) of organic chemicals are pivotal for environmental risk assessment and remediation. Traditional experimental evaluation, constrained by prolonged, oxygen-free conditions, struggles to keep pace with emerging contaminants. Data-driven machine learning (ML) models serve as promising complements. However, reported quantitative structure-biodegradation relationships or ML models on anaerobic biodegradation are mostly based on small data sets (<100 records) and neglect experimental conditions, usually achieving compromised predictions. This work aimed to develop ML models for predicting the biodegradation half-lives of organic pollutants in anaerobic environments (i.e., sediment/soil and sludge). Focusing on important features of both chemicals and experimental conditions, we first curated two data sets, one for sediment/soil (SED) and the other for sludge (SLD), covering 978 records for 206 chemicals from the literature, and then conducted a meta-analysis. Next, we built a binary classification (half-life of 30 days as the cutoff) model with an accuracy of 81% and a regression model with R2 of 0.56 for SED based on LightGBM (80% and 0.31 for SLD based on Extra tree, respectively). The model interpretations underscored the significance of experimental conditions (e.g., temperature and inoculum dosage), as evidenced by their high feature importance, and the models were found to correctly capture the effects of chemical substructures, for example, branched structures and aromatic rings prolonged half-lives while methyl group and ortho-substitution on rings shortened half-lives. The applicability domains of the models were also defined, resulting in reasonable prediction for the half-lives of 41% (SED) or 67% (SLD) of over 4000 persistent, bioaccumulative, and toxic chemicals. Overall, this study pioneers ML models for predicting the anaerobic degradation half-lives, offering valuable support for future evaluation and implementation of chemical anaerobic biodegradation.
Read full abstract