Life beyond the Tanimoto coefficient: similarity measures for interaction fingerprints

Anita Rácz,Dávid Bajusz,Károly Héberger

doi:10.1186/s13321-018-0302-y

Anita Rácz, Dávid Bajusz + Show 1 more

Open Access

https://doi.org/10.1186/s13321-018-0302-y

Copy DOI

Abstract

BackgroundInteraction fingerprints (IFP) have been repeatedly shown to be valuable tools in virtual screening to identify novel hit compounds that can subsequently be optimized to drug candidates. As a complementary method to ligand docking, IFPs can be applied to quantify the similarity of predicted binding poses to a reference binding pose. For this purpose, a large number of similarity metrics can be applied, and various parameters of the IFPs themselves can be customized. In a large-scale comparison, we have assessed the effect of similarity metrics and IFP configurations to a number of virtual screening scenarios with ten different protein targets and thousands of molecules. Particularly, the effect of considering general interaction definitions (such as Any Contact, Backbone Interaction and Sidechain Interaction), the effect of filtering methods and the different groups of similarity metrics were studied.ResultsThe performances were primarily compared based on AUC values, but we have also used the original similarity data for the comparison of similarity metrics with several statistical tests and the novel, robust sum of ranking differences (SRD) algorithm. With SRD, we can evaluate the consistency (or concordance) of the various similarity metrics to an ideal reference metric, which is provided by data fusion from the existing metrics. Different aspects of IFP configurations and similarity metrics were examined based on SRD values with analysis of variance (ANOVA) tests.ConclusionA general approach is provided that can be applied for the reliable interpretation and usage of similarity measures with interaction fingerprints. Metrics that are viable alternatives to the commonly used Tanimoto coefficient were identified based on a comparison with an ideal reference metric (consensus). A careful selection of the applied bits (interaction definitions) and IFP filtering rules can improve the results of virtual screening (in terms of their agreement with the consensus metric). The open-source Python package FPKit was introduced for the similarity calculations and IFP filtering; it is available at: https://github.com/davidbajusz/fpkit.

Highlights

Interaction fingerprints are a relatively new concept in cheminformatics and molecular modeling [1]
In our related earlier works, we have confirmed the choice of the Tanimoto coefficient for molecular fingerprints [26], and more recently we have suggested the Baroni–Urbani–Buser (BUB) and Hawkins–Dotson (HD) coefficients for metabolomic fingerprints [25]
The AUC values were calculated with the scikit-learn Python package for each dataset and for each of the 44 similarity measures [39]

Summary

Introduction

Interaction fingerprints are a relatively new concept in cheminformatics and molecular modeling [1]. 1 (“on”) denotes that the given interaction is established between the given amino acid and the small-molecule ligand (a 0, or “off ” value denotes the lack of that specific interaction). Two such fingerprints are most commonly compared with the Tanimoto similarity metric (taking a value between 0 and 1, with 1 corresponding to identical fingerprints, i.e. protein–ligand interaction patterns). As a complementary method to ligand docking, IFPs can be applied to quantify the similarity of predicted binding poses to a reference binding pose For this purpose, a large number of similarity metrics can be applied, and various parameters of the IFPs themselves can be customized. The effect of considering general interaction definitions (such as Any Contact, Backbone Interaction and Sidechain Interaction), the effect of filtering methods and the different groups of similarity metrics were studied

Objectives

Methods

Results

Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Journal of Cheminformatics	Publication Date: Oct 4, 2018
Citations: 83	License type: open-access

R Discovery Prime

R Discovery Prime

Life beyond the Tanimoto coefficient: similarity measures for interaction fingerprints

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Journal of Cheminformatics

Lead the way for us

Similar Papers

Why is Tanimoto index an appropriate choice for fingerprint-based similarity calculations?
Dávid Bajusz ... Anita Rácz
Journal of Cheminformatics | VOL. 7
Dávid Bajusz, et. al.Dávid Bajusz ... Anita Rácz
20 May 2015
Journal of Cheminformatics | VOL. 7

Method and model comparison by sum of ranking differences in cases of repeated observations (ties)
Klára Kollár-Hunek ... Károly Héberger
Chemometrics and Intelligent Laboratory Systems | VOL. 127
Klára Kollár-Hunek, et. al.Klára Kollár-Hunek ... Károly Héberger
27 Jun 2013
Chemometrics and Intelligent Laboratory Systems | VOL. 127

Extended similarity indices: the benefits of comparing more than two objects simultaneously. Part 1: Theory and characteristics\u2020
Ramón Alain Miranda-Quintana ... Dávid Bajusz
Journal of Cheminformatics | VOL. 13
Ramón Alain Miranda-Quintana, et. al.Ramón Alain Miranda-Quintana ... Dávid Bajusz
23 Apr 2021
Journal of Cheminformatics | VOL. 13

How to compare separation selectivity of high-performance liquid chromatographic columns properly?
Filip Andrić ... Károly Héberger
Journal of Chromatography A | VOL. 1488
Filip Andrić, et. al.Filip Andrić ... Károly Héberger
25 Jan 2017
Journal of Chromatography A | VOL. 1488

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Life beyond the Tanimoto coefficient: similarity measures for interaction fingerprints

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Journal of Cheminformatics