Abstract

Chemical descriptors encode the physicochemical and structural properties of small molecules, and they are at the core of chemoinformatics. The broad release of bioactivity data has prompted enriched representations of compounds, reaching beyond chemical structures and capturing their known biological properties. Unfortunately, bioactivity descriptors are not available for most small molecules, which limits their applicability to a few thousand well characterized compounds. Here we present a collection of deep neural networks able to infer bioactivity signatures for any compound of interest, even when little or no experimental information is available for them. Our signaturizers relate to bioactivities of 25 different types (including target profiles, cellular response and clinical outcomes) and can be used as drop-in replacements for chemical descriptors in day-to-day chemoinformatics tasks. Indeed, we illustrate how inferred bioactivity signatures are useful to navigate the chemical space in a biologically relevant manner, unveiling higher-order organization in natural product collections, and to enrich mostly uncharacterized chemical libraries for activity against the drug-orphan target Snail1. Moreover, we implement a battery of signature-activity relationship (SigAR) models and show a substantial improvement in performance, with respect to chemistry-based classifiers, across a series of biophysics and physiology activity prediction benchmarks.

Highlights

  • Chemical descriptors encode the physicochemical and structural properties of small molecules, and they are at the core of chemoinformatics

  • Inference of bioactivity signatures can be posed as a metric learning problem where observed compound–compound similarities of a given kind are correlated to the full repertoire of CC signatures, so that similarity measures are possible for any compound of interest, including those that are not annotated with experimental data

  • We feed the Siamese Neural Network (SNN) with triplets of molecules (an anchor molecule, one that is similar to the anchor and one that is not), and we ask the SNN to correctly classify this pattern with a distance measurement performed in the embedding space (Fig. 1a and Supplementary Fig. 1)

Read more

Summary

Introduction

Chemical descriptors encode the physicochemical and structural properties of small molecules, and they are at the core of chemoinformatics. Molecular fingerprints are a widespread form of descriptors consisting of binary (1/0) vectors describing the presence or absence of certain molecular substructures These encodings are at the core of chemoinformatics and are fundamental in compound similarity searches and clustering, and are applied to computational drug discovery (CDD), structure optimization, and target prediction. We make bioactivity signatures available for any given compound, assigning confidence to our predictions and illustrating how they can be used to navigate the chemical space in an efficient, biologically relevant manner We explore their added value in the identification of hit compounds against the drug-orphan target Snail[1] in a mostly uncharacterized compound library, and through the implementation of a battery of signature–activity relationship (SigAR) models to predict biophysical and physiological properties of molecules

Objectives
Methods
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call