Abstract

An affinity fingerprint is the vector consisting of compound’s affinity or potency against the reference panel of protein targets. Here, we present the QAFFP fingerprint, 440 elements long in silico QSAR-based affinity fingerprint, components of which are predicted by Random Forest regression models trained on bioactivity data from the ChEMBL database. Both real-valued (rv-QAFFP) and binary (b-QAFFP) versions of the QAFFP fingerprint were implemented and their performance in similarity searching, biological activity classification and scaffold hopping was assessed and compared to that of the 1024 bits long Morgan2 fingerprint (the RDKit implementation of the ECFP4 fingerprint). In both similarity searching and biological activity classification, the QAFFP fingerprint yields retrieval rates, measured by AUC (~ 0.65 and ~ 0.70 for similarity searching depending on data sets, and ~ 0.85 for classification) and EF5 (~ 4.67 and ~ 5.82 for similarity searching depending on data sets, and ~ 2.10 for classification), comparable to that of the Morgan2 fingerprint (similarity searching AUC of ~ 0.57 and ~ 0.66, and EF5 of ~ 4.09 and ~ 6.41, depending on data sets, classification AUC of ~ 0.87, and EF5 of ~ 2.16). However, the QAFFP fingerprint outperforms the Morgan2 fingerprint in scaffold hopping as it is able to retrieve 1146 out of existing 1749 scaffolds, while the Morgan2 fingerprint reveals only 864 scaffolds.

Highlights

  • Virtual screening (VS) is a set of computational approaches used in the early stages of the drug discovery process

  • While at this setting the b-Quantitative Structure-Activity Relationship (QSAR) affinity fingerprint (QAFFP) fingerprint yields statistically significantly better AUCboth for the Heterogeneous data sets (HET) and Homogeneous data sets (HOM) data sets, EF5 is significantly better for the Morgan fingerprint with the radius of 2 (Morgan2) fingerprint in the case of the HET data sets (p-value is 6.70e−04 for alternative Morgan2 > QAFFP) and there are no significant differences in EF5 between the b-QAFFP and Morgan2 fingerprints for the HOM data sets

  • Model Applicability Domain (AD) was estimated by an inductive conformal prediction (ICP) with the confidence level of 90%. rv-QAFFP models were trained using raw data

Read more

Summary

Introduction

Virtual screening (VS) is a set of computational approaches used in the early stages of the drug discovery process. While the COMPARE profile is based on a cellular response, bioactivity profiles were constructed using molecular target properties. In the so-called ‘affinity fingerprint approach’, 122 small molecules were encoded by their binding potencies against a reference panel of 8 proteins [11] and a regression model was used to predict compound potencies on two new targets. Apart from affinity fingerprints and biospectra, several other names for the description of a molecule using its experimentally determined bioactivity profile have been proposed: chemical genomic profile [14], chemical-genetic fingerprint [15] or activity spectrum [16, 17]

Objectives
Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call