Abstract

Comparison of small molecules is a common component of many cheminformatics workflows, including the design of new compounds and libraries as well as side-effect predictions and drug repurposing. Currently, large-scale comparison methods rely mostly on simple fingerprint representation of molecules, which take into account the structural similarities of compounds. Methods that utilize 3D information depend on multiple conformer generation steps, which are computationally expensive and can greatly influence their results. The aim of this study was to augment molecule representation with spatial and physicochemical properties while simultaneously avoiding conformer generation. To achieve this goal, we describe a molecule as an undirected graph in which the nodes correspond to atoms with pharmacophoric properties and the edges of the graph represent the distances between features. This approach combines the benefits of a conformation-free representation of a molecule with additional spatial information. We implemented our approach as an open-source Python module called DeCAF (Discrimination, Comparison, Alignment tool for 2D PHarmacophores), freely available at http://bitbucket.org/marta-sd/decaf. We show DeCAF’s strengths and weaknesses with usage examples and thorough statistical evaluation. Additionally, we show that our method can be manually tweaked to further improve the results for specific tasks. The full dataset on which DeCAF was evaluated and all scripts used to calculate and analyze the results are also provided.

Highlights

  • One of the outstanding challenges in virtual screening is the development of a fast and robust algorithm to compare many compounds and identify subsets with biological activity

  • We show an example of the target with less strict distance preference in the Section 4

  • We compared 15 representations (14 fingerprints and DeCAF) with ROC AUC and enrichment factor (EF) for 1% of the top ranked predictions

Read more

Summary

Introduction

One of the outstanding challenges in virtual screening is the development of a fast and robust algorithm to compare many compounds and identify subsets with biological activity. It is often desired that such subsets are diverse with respect to their basic molecular scaffolds. Such scaffold hopping can be used to break out of the protected “patent space” or to find molecules with different, more desirable pharmacological properties. One highly popular idea is to represent a compound as a vector with the presence (1) or absence (0) of a set of features describing chemical structure and properties. This method is very efficient as both vector generation (fingerprint) and screening task (vector comparison) are computationally inexpensive. Fingerprint representations have very limited capacity, do not scale well with increasing compound size and complexity, depend strongly on a set of predefined features, and have poor scaffold-hopping performance [2,3]

Objectives
Methods
Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call