Abstract

This study aims at improving upon existing activity predictions methods by augmenting chemical structure fingerprints with bio-activity based fingerprints derived from high-throughput screening (HTS) data (HTSFPs) and thereby showcasing the benefits of combining different descriptor types. This type of descriptor would be applied in an iterative screening scenario for more targeted compound set selection. The HTSFPs were generated from HTS data obtained from PubChem and combined with an ECFP4 structural fingerprint. The bioactivity-structure hybrid (BaSH) fingerprint was benchmarked against the individual ECFP4 and HTSFP fingerprints. Their performance was evaluated via retrospective analysis of a subset of the PubChem HTS data. Results showed that the BaSH fingerprint has improved predictive performance as well as scaffold hopping capability. The BaSH fingerprint identified unique compounds compared to both the ECFP4 and the HTSFP fingerprint indicating synergistic effects between the two fingerprints. A feature importance analysis showed that a small subset of the HTSFP features contribute most to the overall performance of the BaSH fingerprint. This hybrid approach allows for activity prediction of compounds with only sparse HTSFPs due to the supporting effect from the structural fingerprint.

Highlights

  • The traditional and most intuitive method of predicting compound activity is through the use of structure activity relationship (SAR) models

  • The results indicate that this combined fingerprint is a useful tool for scaffold hopping, detecting a more diverse set of active compounds with different scaffolds and identifying novel scaffolds that were not identified with either the ECFP4 or the HTS Fingerprint (HTSFP)

  • Feature importance analysis quantified the relative contributions of ECFP4 and HTSFP to the bioactivity-structure hybrid (BaSH) predictions, revealing that a small subset of the HTSFP features contribute most to the overall performance

Read more

Summary

Introduction

The traditional and most intuitive method of predicting compound activity is through the use of structure activity relationship (SAR) models. While SAR-based activity predictions are a practical and often effective method, the predictions made are based on structural similarity and are inherently limited in structural diversity This limits the scaffold hopping potential or exploration of chemical space and impedes the identification of novel active compounds. A study presented by Fliri et al used a somewhat larger database to build bioactivity profiles termed ‘biospectra’ to predict compound-target activities [4]. This bioactivity profile was based on a panel of 1567 compounds and 92 assays representing a diverse cross-section of the proteome

Objectives
Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call