Atom-pair Fingerprint Research Articles

BackgroundThe concept of molecular similarity is one of the central ideas in cheminformatics, despite the fact that it is ill-defined and rather difficult to assess objectively. Here we propose a practical definition of molecular similarity in the context of drug discovery: molecules A and B are similar if a medicinal chemist would be likely to synthesise and test them around the same time as part of the same medicinal chemistry program. The attraction of such a definition is that it matches one of the key uses of similarity measures in early-stage drug discovery. If we make the assumption that molecules in the same compound activity table in a medicinal chemistry paper were considered similar by the authors of the paper, we can create a dataset of similar molecules from the medicinal chemistry literature. Furthermore, molecules with decreasing levels of similarity to a reference can be found by either ordering molecules in an activity table by their activity, or by considering activity tables in different papers which have at least one molecule in common.ResultsUsing this procedure with activity data from ChEMBL, we have created two benchmark datasets for structural similarity that can be used to guide the development of improved measures. Compared to similar results from a virtual screen, these benchmarks are an order of magnitude more sensitive to differences between fingerprints both because of their size and because they avoid loss of statistical power due to the use of mean scores or ranks. We measure the performance of 28 different fingerprints on the benchmark sets and compare the results to those from the Riniker and Landrum (J Cheminf 5:26, 2013. doi:10.1186/1758-2946-5-26) ligand-based virtual screening benchmark.ConclusionsExtended-connectivity fingerprints of diameter 4 and 6 are among the best performing fingerprints when ranking diverse structures by similarity, as is the topological torsion fingerprint. However, when ranking very close analogues, the atom pair fingerprint outperforms the others tested. When ranking diverse structures or carrying out a virtual screen, we find that the performance of the ECFP fingerprints significantly improves if the bit-vector length is increased from 1024 to 16,384.Graphical abstractAn example series from one of the benchmark datasets. Each fingerprint is assessed on its ability to reproduce a specific series order.Electronic supplementary materialThe online version of this article (doi:10.1186/s13321-016-0148-0) contains supplementary material, which is available to authorized users.

Read full abstract

BackgroundTools to explore large compound databases in search for analogs of query molecules provide a strategically important support in drug discovery to help identify available analogs of any given reference or hit compound by ligand based virtual screening (LBVS). We recently showed that large databases can be formatted for very fast searching with various 2D-fingerprints using the city-block distance as similarity measure, in particular a 2D-atom pair fingerprint (APfp) and the related category extended atom pair fingerprint (Xfp) which efficiently encode molecular shape and pharmacophores, but do not perceive stereochemistry. Here we investigated related 3D-atom pair fingerprints to enable rapid stereoselective searches in the ZINC database (23.2 million 3D structures).ResultsMolecular fingerprints counting atom pairs at increasing through-space distance intervals were designed using either all atoms (16-bit 3DAPfp) or different atom categories (80-bit 3DXfp). These 3D-fingerprints retrieved molecular shape and pharmacophore analogs (defined by OpenEye ROCS scoring functions) of 110,000 compounds from the Cambridge Structural Database with equal or better accuracy than the 2D-fingerprints APfp and Xfp, and showed comparable performance in recovering actives from decoys in the DUD database. LBVS by 3DXfp or 3DAPfp similarity was stereoselective and gave very different analogs when starting from different diastereomers of the same chiral drug. Results were also different from LBVS with the parent 2D-fingerprints Xfp or APfp. 3D- and 2D-fingerprints also gave very different results in LBVS of folded molecules where through-space distances between atom pairs are much shorter than topological distances.Conclusions3DAPfp and 3DXfp are suitable for stereoselective searches for shape and pharmacophore analogs of query molecules in large databases. Web-browsers for searching ZINC by 3DAPfp and 3DXfp similarity are accessible at www.gdb.unibe.ch and should provide useful assistance to drug discovery projects.Graphical abstractAtom pair fingerprints based on through-space distances (3DAPfp) provide better shape encoding than atom pair fingerprints based on topological distances (APfp) as measured by the recovery of ROCS shape analogs by fp similarity.Electronic supplementary materialThe online version of this article (doi:10.1186/s13321-014-0051-5) contains supplementary material, which is available to authorized users.

Read full abstract

Atom-pair Fingerprint Research Articles

Related Topics

Articles published on Atom-pair Fingerprint

Analysis and Comparison of Vector Space and Metric Space Representations in QSAR Modeling.

PubChem and ChEMBL beyond Lipinski.

WebMolCS: A Web-Based Interface for Visualizing Molecules in Three-Dimensional Chemical Spaces.

Comparing structural fingerprints using a literature-based similarity benchmark.

Web-based 3D-visualization of the DrugBank chemical space.

Stereoselective virtual screening of the ZINC database using atom pair 3D-fingerprints.

Development of a novel fingerprint for chemical reactions and its application to large-scale reaction classification and similarity.

Atom pair 2D-fingerprints perceive 3D-molecular shape and pharmacophores for very fast virtual screening of ZINC and GDB-17.

PyDPI: Freely Available Python Package for Chemoinformatics, Bioinformatics, and Chemogenomics Studies

Chemically Advanced Template Search (CATS) for Scaffold‐Hopping and Prospective Target Prediction for ‘Orphan’ Molecules

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Atom-pair Fingerprint Research Articles

Related Topics

Articles published on Atom-pair Fingerprint

Analysis and Comparison of Vector Space and Metric Space Representations in QSAR Modeling.

PubChem and ChEMBL beyond Lipinski.

WebMolCS: A Web-Based Interface for Visualizing Molecules in Three-Dimensional Chemical Spaces.

Comparing structural fingerprints using a literature-based similarity benchmark.

Web-based 3D-visualization of the DrugBank chemical space.

Stereoselective virtual screening of the ZINC database using atom pair 3D-fingerprints.

Development of a novel fingerprint for chemical reactions and its application to large-scale reaction classification and similarity.

Atom pair 2D-fingerprints perceive 3D-molecular shape and pharmacophores for very fast virtual screening of ZINC and GDB-17.

PyDPI: Freely Available Python Package for Chemoinformatics, Bioinformatics, and Chemogenomics Studies

Chemically Advanced Template Search (CATS) for Scaffold‐Hopping and Prospective Target Prediction for ‘Orphan’ Molecules