QSAR-derived affinity fingerprints (part 2): modeling performance for potency prediction

Isidro Cortés-Ciriano,Ctibor Škuta,Daniel Svozil,Andreas Bender

doi:10.1186/s13321-020-00444-5

Abstract

Affinity fingerprints report the activity of small molecules across a set of assays, and thus permit to gather information about the bioactivities of structurally dissimilar compounds, where models based on chemical structure alone are often limited, and model complex biological endpoints, such as human toxicity and in vitro cancer cell line sensitivity. Here, we propose to model in vitro compound activity using computationally predicted bioactivity profiles as compound descriptors. To this aim, we apply and validate a framework for the calculation of QSAR-derived affinity fingerprints (QAFFP) using a set of 1360 QSAR models generated using Ki, Kd, IC50 and EC50 data from ChEMBL database. QAFFP thus represent a method to encode and relate compounds on the basis of their similarity in bioactivity space. To benchmark the predictive power of QAFFP we assembled IC50 data from ChEMBL database for 18 diverse cancer cell lines widely used in preclinical drug discovery, and 25 diverse protein target data sets. This study complements part 1 where the performance of QAFFP in similarity searching, scaffold hopping, and bioactivity classification is evaluated. Despite being inherently noisy, we show that using QAFFP as descriptors leads to errors in prediction on the test set in the ~ 0.65–0.95 pIC50 units range, which are comparable to the estimated uncertainty of bioactivity data in ChEMBL (0.76–1.00 pIC50 units). We find that the predictive power of QAFFP is slightly worse than that of Morgan2 fingerprints and 1D and 2D physicochemical descriptors, with an effect size in the 0.02–0.08 pIC50 units range. Including QSAR models with low predictive power in the generation of QAFFP does not lead to improved predictive power. Given that the QSAR models we used to compute the QAFFP were selected on the basis of data availability alone, we anticipate better modeling results for QAFFP generated using more diverse and biologically meaningful targets. Data sets and Python code are publicly available at https://github.com/isidroc/QAFFP_regression.

Highlights

A major research question in Quantitative Structure– Activity Relationship (QSAR) has been how to numerically encode small molecules [1,2,3,4]
The use of a linear model to assess the predictive power of QAFFPP, Morgan2 fingerprints and Physchem descriptors allowed to control for the variability across data sets, and to avoid that results were biased by elements such as the number of datapoints, data set modellability, etc
We have performed a comprehensive assessment of the performance of regression models trained on QSAR-derived affinity fingerprints (QAFFP)

Summary

Introduction

A major research question in Quantitative Structure– Activity Relationship (QSAR) has been (and still is) how to numerically encode small molecules [1,2,3,4]. The set of bioactivities across a panel of assays are usually known as affinity, bioactivity, protein or high-throughput screening fingerprints [12,13,14,15]. Note that the term ‘affinity fingerprint’ is often used even when the bioactivity endpoints are not K i nor Kd values, but rather assay-specific metrics of potency, such as IC50 or EC50 values, so it comprises a broad set of activity spectra-based descriptors. In the following and in the accompanying manuscript, we use the term affinity fingerprint to refer to the set of biological endpoints, experimentally determined or predicted, irrespective of whether the endpoint measured corresponds to a potency or an affinity metric. For a comprehensive review of existing methods to predict affinity fingerprints using existing high-throughput data [16,17,18,19,20,21,22,23,24,25,26,27,28], the reader is referred to the introduction of the accompanying manuscript [29]

Methods

Results

Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Journal of Cheminformatics	Publication Date: Jun 5, 2020
Citations: 17	License type: open-access

R Discovery Prime

R Discovery Prime

QSAR-derived affinity fingerprints (part 2): modeling performance for potency prediction

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Journal of Cheminformatics

Lead the way for us

Similar Papers

Classification of High‐Activity Tiagabine Analogs by Binary QSAR Modeling
Andreas Jurik ... Gerhard F Ecker
Molecular Informatics | VOL. 32
Andreas Jurik, et. al.Andreas Jurik ... Gerhard F Ecker
15 May 2013
Molecular Informatics | VOL. 32

Improved Chemical Structure-Activity Modeling Through Data Augmentation.
Isidro Cortes-Ciriano ... Andreas Bender
Journal of Chemical Information and Modeling | VOL. 55
Isidro Cortes-Ciriano, et. al.Isidro Cortes-Ciriano ... Andreas Bender
11 Dec 2015
Journal of Chemical Information and Modeling | VOL. 55

High predictive QSAR models for predicting the SARS coronavirus main protease inhibition activity of ketone-based covalent inhibitors
Bakhtyar Sepehri ... Mohammad Kohnehpoushi
Journal of the Iranian Chemical Society | VOL. 19
Bakhtyar Sepehri, et. al.Bakhtyar Sepehri ... Mohammad Kohnehpoushi
26 Oct 2021
Journal of the Iranian Chemical Society | VOL. 19

Reliable estimation of prediction errors for QSAR models under model uncertainty using double cross-validation.
Désirée Baumann ... Knut Baumann
Journal of Cheminformatics | VOL. 6
Désirée Baumann, et. al.Désirée Baumann ... Knut Baumann
26 Nov 2014
Journal of Cheminformatics | VOL. 6

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

QSAR-derived affinity fingerprints (part 2): modeling performance for potency prediction

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Journal of Cheminformatics