Abstract

Deep neural networks (DNNs) are the major drivers of recent progress in artificial intelligence. They have emerged as the machine-learning method of choice in solving image and speech recognition problems, and their potential has raised the expectation of similar breakthroughs in other fields of study. In this work, we compared three machine-learning methods-DNN, random forest (a popular conventional method), and variable nearest neighbor (arguably the simplest method)-in their ability to predict the molecular activities of 21 in vivo and in vitro data sets. Surprisingly, the overall performance of the three methods was similar. For molecules with structurally close near neighbors in the training sets, all methods gave reliable predictions, whereas for molecules increasingly dissimilar to the training molecules, all three methods gave progressively poorer predictions. For molecules sharing little to no structural similarity with the training molecules, all three methods gave a nearly constant value-approximately the average activity of all training molecules-as their predictions. The results confirm conclusions deduced from analyzing molecular applicability domains for accurate predictions, i.e., the most important determinant of the accuracy of predicting a molecule is its similarity to the training samples. This highlights the fact that even in the age of deep learning, developing a truly high-quality model relies less on the choice of machine-learning approach and more on the availability of experimental efforts to generate sufficient training data of structurally diverse compounds. The results also indicate that the distance to training molecules offers a natural and intuitive basis for defining applicability domains to flag reliable and unreliable quantitative structure-activity relationship predictions.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.