Abstract

Persistent, mobile, and toxic (PMT) substances and very persistent and very mobile (vPvM) substances can transport over long distances from various sources, increasing the public health risk. A rapid and high-throughput screening of PMT/vPvM substances is thus warranted to the risk prevention and mitigation measures. Herein, we construct a machine learning-based screening system integrated with five models for high-throughput classification of PMT/vPvM substances. The models are constructed with 44 971 substances by conventional learning, deep learning, and ensemble learning algorithms, among which, LightGBM and XGBoost outperform other algorithms with metrics exceeding 0.900. Good model interpretability is achieved through the number of free halogen atoms (fr_halogen) and the logarithm of partition coefficient (MolLogP) as the two most critical molecular descriptors representing the persistence and mobility of substances, respectively. Our screening system exhibits a great generalization capability with area under the receiver operating characteristic curve (AUROC) above 0.951 and is successfully applied to the persistent organic pollutants (POPs), prioritized PMT/vPvM substances, and pesticides. The screening system constructed in this study can serve as an efficient and reliable tool for high-throughput risk assessment and the prioritization of managing emerging contaminants.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call