This study aimed to develop machine learning based quantitative structure biodegradability relationship (QSBR) models for predicting primary and ultimate biodegradation rates of organic chemicals, which are essential parameters for environmental risk assessment. For this purpose, experimental primary and ultimate biodegradation rates of high consistency were compiled for 173 organic compounds. A significant number of descriptors were calculated with a collection of quantum/computational chemistry software and tools to achieve comprehensive representation and interpretability. Following a pre-screening process, multiple QSBR models were developed for both primary and ultimate endpoints using three algorithms: extreme gradient boosting (XGBoost), support vector machine (SVM), and multiple linear regression (MLR). Furthermore, a unified QSBR model was constructed using the knowledge transfer technique and XGBoost. Results demonstrated that all QSBR models developed in this study had good performance. Particularly, SVM models exhibited high level of goodness of fit (coefficient of determination on the training set of 0.973 for primary and 0.980 for ultimate), robustness (leave-one-out cross-validated coefficient of 0.953 for primary and 0.967 for ultimate), and external predictive ability (external explained variance of 0.947 for primary and 0.958 for ultimate). The knowledge transfer technique enhanced model performance by learning from properties of two biodegradation endpoints. Williams plots were used to visualize the application domains of the models. Through SHapley Additive exPlanations (SHAP) analysis, this study identified key features affecting biodegradation rates. Notably, MDEO-12, APC2D1_C_O, and other features contributed to primary biodegradation, while AATS0v, AATS2v, and others inhibited it. For ultimate biodegradation, features like No. of Rotatable Bonds, APC2D1_C_O, and minHBa were contributors, while C1SP3, Halogen Ratio, GGI4, and others hindered the process. Also, the study quantified the contributions of each feature in predictions for individual chemicals. This research provides valuable tools for predicting both primary and ultimate biodegradation rates while offering insights into the mechanisms.
Read full abstract