Prediction of free radical reactions toward organic pollutants with easily accessible molecular descriptors

Guoyang Zhang,Qiang Zhu,Hongcen Zheng,Shujuan Zhang,Jing Ma

doi:10.1016/j.chemosphere.2023.140660

Abstract

Machine learning (ML) is becoming an efficient tool for predicting the fate of aquatic contaminants owing to the preponderance of big data. However, whether ML can “learn” the differences in reactivity among different free radicals has not yet been tested. In this work, the effectiveness of combining ML algorithms with molecular fingerprints to predict the reactivity of three free radicals was evaluated. First, a dataset containing 211 organic pollutants and their respective rate constants with the carbonate radical (CO3•−) was used to develop predictive models using both linear regression and ML methods. The use of topological atomic alignment information, in the form of the molecular access system (MACCS) and Morgan Fingerprint, and the electronic structure features (energy levels of the lowest unoccupied and highest occupied molecular orbitals, ELUMO and EHOMO, and the energy gap between ELUMO and EHOMO) gave satisfactory predictive performances (ML model with Random Forest algorithm with MACCS: RMSEtest = 0.787; linear regression model with energy levels: RMSEtest = 0.641). Additionally, the model interpretation correctly described that the key reactivity features for CO3•− were relatively close to those for SO4•− rather than those for •OH. These results suggest that combination of ML algorithms with easily accessible molecular fingerprints would be a powerful tool to accurately predict the radical reactions towards organic compounds.

Full Text