Abstract
The notion of activity cliffs is an intuitive approach to characterizing structural features that play a key role in modulating biological activity of a molecule. A variety of methods have been described to quantitatively characterize activity cliffs, such as SALI and SARI. However, these methods are primarily retrospective in nature; highlighting cliffs that are already present in the data set. The current study focuses on employing a pairwise characterization of a data set to train a model to predict whether a new molecule will exhibit an activity cliff with one or more members of the data set. The approach is based on predicting a value for pairs of objects rather than the individual objects themselves (and thus allows for robust models even for small structure-activity relationship data sets). We extracted structure-activity data for several ChEMBL assays and developed random forest models to predict SALI values, from pairwise combinations of molecular descriptors. The models exhibited reasonable RMSE's though, surprisingly, performance on the more significant cliffs tended to be better than on the lesser ones. While the models do not exhibit very high levels of accuracy, our results indicate that they are able to prioritize molecules in terms of their ability to activity cliffs, thus serving as a tool to prospectively identify activity cliffs.
Accepted Version (Free)
Published Version
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have