Using machine learning to classify the immunosuppressive activity of per- and polyfluoroalkyl substances

Yuxin Xuan,Yulu Wang,Rui Li,Yuyan Zhong,Na Wang,Lingyin Zhang,Qian Chen,Shuling Yu,Jintao Yuan

doi:10.1080/15376516.2024.2387733

Abstract

Per- and polyfluoroalkyl substances (PFASs), one of the persistent organic pollutants, have immunosuppressive effects. The evaluation of this effect has been the focus of regulatory toxicology. In this investigation, 146 PFASs (immunosuppressive or nonimmunosuppressive) and corresponding concentration gradients were collected from literature, and their structures were characterized by using Dragon descriptors. Feature importance analysis and stepwise feature elimination are used for feature selection. Three machine learning (ML) methods, namely Random Forest (RF), Extreme Gradient Boosting Machine (XGB), and Categorical Boosting Machine (CB), were utilized for model development. The model interpretability was explored by feature importance analysis and correlation analysis. The findings indicated that the three models developed have exhibited excellent performance. Among them, the best-performing RF model has an average AUC score of 0.9720 for the testing set. The results of the feature importance analysis demonstrated that concentration, SpPosA_X, IVDE, R2s, and SIC2 were the crucial molecular features. Applicability domain analysis was also performed to determine reliable prediction boundaries for the model. In conclusion, this study is the first application of ML models to investigate the immunosuppressive activity of PFASs. The variables used in the models can help understand the mechanism of the immunosuppressive activity of PFASs, allow researchers to more effectively assess the immunosuppressive potential of a large number of PFASs, and thus better guide environmental and health risk assessment efforts.

Full Text