Comparing structural fingerprints using a literature-based similarity benchmark.

Noel M O’Boyle,Roger A Sayle

doi:10.1186/s13321-016-0148-0

Abstract

BackgroundThe concept of molecular similarity is one of the central ideas in cheminformatics, despite the fact that it is ill-defined and rather difficult to assess objectively. Here we propose a practical definition of molecular similarity in the context of drug discovery: molecules A and B are similar if a medicinal chemist would be likely to synthesise and test them around the same time as part of the same medicinal chemistry program. The attraction of such a definition is that it matches one of the key uses of similarity measures in early-stage drug discovery. If we make the assumption that molecules in the same compound activity table in a medicinal chemistry paper were considered similar by the authors of the paper, we can create a dataset of similar molecules from the medicinal chemistry literature. Furthermore, molecules with decreasing levels of similarity to a reference can be found by either ordering molecules in an activity table by their activity, or by considering activity tables in different papers which have at least one molecule in common.ResultsUsing this procedure with activity data from ChEMBL, we have created two benchmark datasets for structural similarity that can be used to guide the development of improved measures. Compared to similar results from a virtual screen, these benchmarks are an order of magnitude more sensitive to differences between fingerprints both because of their size and because they avoid loss of statistical power due to the use of mean scores or ranks. We measure the performance of 28 different fingerprints on the benchmark sets and compare the results to those from the Riniker and Landrum (J Cheminf 5:26, 2013. doi:10.1186/1758-2946-5-26) ligand-based virtual screening benchmark.ConclusionsExtended-connectivity fingerprints of diameter 4 and 6 are among the best performing fingerprints when ranking diverse structures by similarity, as is the topological torsion fingerprint. However, when ranking very close analogues, the atom pair fingerprint outperforms the others tested. When ranking diverse structures or carrying out a virtual screen, we find that the performance of the ECFP fingerprints significantly improves if the bit-vector length is increased from 1024 to 16,384.Graphical abstractAn example series from one of the benchmark datasets. Each fingerprint is assessed on its ability to reproduce a specific series order.Electronic supplementary materialThe online version of this article (doi:10.1186/s13321-016-0148-0) contains supplementary material, which is available to authorized users.

Highlights

The concept of molecular similarity is one of the central ideas in cheminformatics, despite the fact that it is ill-defined and rather difficult to assess objectively
Using this procedure with activity data from ChEMBL, we have created two benchmark datasets for structural similarity that can be used to guide the development of improved measures
Identifying structurally similar molecules from ChEMBL assays Both of the new benchmarks use co-appearance in the same ChEMBL assay as an indication that two molecules are structurally similar

Summary

Introduction

The concept of molecular similarity is one of the central ideas in cheminformatics, despite the fact that it is ill-defined and rather difficult to assess objectively. We propose a practical definition of molecular similarity in the context of drug discovery: molecules A and B are similar if a medicinal chemist would be likely to synthesise and test them around the same time as part of the same medicinal chemistry program. The Similar Property Principle (SPP) is the observation that structurally similar molecules tend to have similar properties [1] This is a cornerstone of drug discovery, as it means that successive small changes to the structure of an active should retain biological activity against a target. The most common way to measure this is to compare molecular fingerprints, binary or count vectors that encode features of molecules This numerical measure of similarity may be used for similarity searching, ligand-based virtual screens, clustering and diversity analysis [3,4,5]

Objectives

Methods

Results

Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Journal of Cheminformatics	Publication Date: Jul 5, 2016
Citations: 163	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

Comparing structural fingerprints using a literature-based similarity benchmark.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Journal of Cheminformatics

Lead the way for us

Similar Papers

Computational Medicinal Chemistry
Gisbert Schneider
Future Medicinal Chemistry | VOL. 3
Gisbert SchneiderGisbert Schneider
01 Mar 2011
Future Medicinal Chemistry | VOL. 3

Database of Case Studies in Drug Discovery
Claude Cohen ... Elie Cohen
Chemical Biology & Drug Design | VOL. 67
Claude Cohen, et. al.Claude Cohen ... Elie Cohen
01 Feb 2006
Chemical Biology & Drug Design | VOL. 67

Computational Medicinal Chemistry: Part II
Gino D’Oca
Future Medicinal Chemistry | VOL. 3
Gino D’OcaGino D’Oca
01 Apr 2011
Future Medicinal Chemistry | VOL. 3

MolProphet: A One-Stop, General Purpose, and AI-Based Platform for the Early Stages of Drug Discovery.
Bin Ju ... Zuodong Xu
Journal of chemical information and modeling | VOL. 64
Bin Ju, et. al.Bin Ju ... Zuodong Xu
02 Apr 2024
Journal of chemical information and modeling | VOL. 64

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Comparing structural fingerprints using a literature-based similarity benchmark.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Journal of Cheminformatics