Abstract

In this study, we developed new computational DNA adduct prediction models by using significantly more diverse training data-set of 217 DNA adducts and 1024 non-DNA adducts, and applying five machine learning methods which include support vector machine (SVM), k-nearest neighbour, artificial neural networks, logistic regression and continuous kernel discrimination. The molecular descriptors used for DNA adduct prediction were selected from a pool of 548 descriptors by using a multi-step hybrid feature selection method combining Fischer-score and Monte Carlo simulated annealing method. Some of the selected descriptors are consistent with the structural and physicochemical properties reported to be important for DNA adduct formation. The y-scrambling method was used to test whether there is a chance correlation in the developed SVM model. In the meantime, fivefold cross-validation of these machine learning methods results in the prediction accuracies of 64.1–82.5% for DNA adducts and 95.1–97.6% for non-DNA adducts, and the prediction accuracies for external test set are 78.2–100% for DNA adducts and 92.6–98.4% for non-DNA adducts. Our study suggested that the tested machine learning methods are potentially useful for DNA adducts identification.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.