Abstract

Carcinogenicity is one of the most concerned properties of chemicals to human health, thus it is important to identify chemical carcinogenicity as early as possible. In this study, 829 diverse compounds with rat carcinogenicity were collected from Carcinogenic Potency Database (CPDB). Using six types of fingerprints to represent the molecules, 30 binary and ternary classification models were generated to predict chemical carcinogenicity by five machine learning methods. The models were evaluated by an external validation set containing 87 chemicals from ISSCAN database. The best binary model was developed by MACCS keys and kNN algorithm with predictive accuracy at 83.91 %, while the best ternary model was also generated by MACCS keys and kNN algorithm with overall accuracy at 80.46 %. Furthermore, the best binary and ternary classification models were used to estimate carcinogenicity of tobacco smoke components containing 2251 compounds. 981 ones were predicted as carcinogens by binary classification model, while 110 compounds were predicted as strong carcinogens and 807 ones as weak carcinogens by ternary classification model. The results indicated that our models would be helpful for prediction of chemical carcinogenicity.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call