Abstract

Nitroaromatic compounds (NACs) represent a significant source of organic pollutants in the environment. In this study, a well-rounded dataset containing 371 NACs with rat oral median lethal doses (LD50s) was developed. Based on the dataset, binary and multiple classification models were established. Seven machine learning algorithms were used to establish the prediction models in combination with six fingerprints. In the binary classification models, the overall predictive accuracy of 10-fold cross-validation for training set in the top ten models ranged from 0.823 to 0.874. In the multiple classification models, the combination of graph fingerprint and random forest (Graph-RF) yielded the best predictive effects with AUC values of 0.929 and 0.956 for the training set and the test set, respectively. Model prediction performance was further evaluated using the true external set comprising 1366 NACs, including 96.6% belonging to the applicability domain. Further, we determined the structural features influencing the acute oral toxicity based on information gain and substructure frequency analysis. Finally, we identified highly toxic compounds based on the structural alerts and successfully transformed a representative highly toxic compound into low-toxic alternatives via structural modification. Overall, the models constructed facilitate environmental risk assessment and the design of green and safe chemicals.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.