Scope of 3D Shape-Based Approaches in Predicting the Macromolecular Targets of Structurally Complex Small Molecules Including Natural Products and Macrocyclic Ligands.

Ya Chen,Johannes Kirchmair,Neann Mathai

doi:10.1021/acs.jcim.0c00161

Abstract

A plethora of similarity-based, network-based, machine learning, docking and hybrid approaches for predicting the macromolecular targets of small molecules are available today and recognized as valuable tools for providing guidance in early drug discovery. With the increasing maturity of target prediction methods, researchers have started to explore ways to expand their scope to more challenging molecules such as structurally complex natural products and macrocyclic small molecules. In this work, we systematically explore the capacity of an alignment-based approach to identify the targets of structurally complex small molecules (including large and flexible natural products and macrocyclic compounds) based on the similarity of their 3D molecular shape to noncomplex molecules (i.e., more conventional, “drug-like”, synthetic compounds). For this analysis, query sets of 10 representative, structurally complex molecules were compiled for each of the 28 pharmaceutically relevant proteins. Subsequently, ROCS, a leading shape-based screening engine, was utilized to generate rank-ordered lists of the potential targets of the 28 × 10 queries according to the similarity of their 3D molecular shapes with those of compounds from a knowledge base of 272 640 noncomplex small molecules active on a total of 3642 different proteins. Four of the scores implemented in ROCS were explored for target ranking, with the TanimotoCombo score consistently outperforming all others. The score successfully recovered the targets of 30% and 41% of the 280 queries among the top-5 and top-20 positions, respectively. For 24 out of the 28 investigated targets (86%), the method correctly assigned the first rank (out of 3642) to the target of interest for at least one of the 10 queries. The shape-based target prediction approach showed remarkable robustness, with good success rates obtained even for compounds that are clearly distinct from any of the ligands present in the knowledge base. However, complex natural products and macrocyclic compounds proved to be challenging even with this approach, although cases of complete failure were recorded only for a small number of targets.

Highlights

The past decade has seen a boost in the development of in silico approaches for the prediction of the macromolecular targets of small molecules.[1−3] Progress has been fueled by, among other factors, (i) the increasing amount of chemical and biological data available in the public domain, (ii) the strategic shift from the “one drug-one target” paradigm that had dominated small-molecule drug discovery for decades to the concept of polypharmacology,[4] and (iii) advances in computational power and algorithms
There are several classes of in silico approaches for target prediction in existence: (i) similarity-based methods, which use the similarity between data such as small molecules, targets, and interactions to make predictions,[6] (ii) network-based methods, where networks based on anything from ligand similarity[7] to highly heterogeneous data are built to gain systemic understanding of modeled data,[8] (iii) machine learning approaches, which make use of machine learning methods such as random forests, support vector machines, or artificial neural networks to make predictions,[9] (iv) reverse docking methods, which dock queries into potential targets to make predictions based on docking scores[3] and methods which combine two or several types of these approaches.[1]
Important parameter in drug discovery besides log P, The aim of this work is to determine the capacity of 3D alignment-dependent shape-based approaches to predict the macromolecular targets of complex small molecules (CSMs) based on their similarity to non-CSMs with measured bioactivities (Figure 2)

Summary

Introduction

The past decade has seen a boost in the development of in silico approaches for the prediction of the macromolecular targets of small molecules.[1−3] Progress has been fueled by, among other factors, (i) the increasing amount of chemical and biological data available in the public domain, (ii) the strategic shift from the “one drug-one target” paradigm that had dominated small-molecule drug discovery for decades to the concept of polypharmacology,[4] and (iii) advances in computational power and algorithms. PubChem currently contains more than 102 million compounds and 268 million bioactivity data points,[13] and the latest release of the ChEMBL database contains close to 2 million compounds, with more than 16 million measured activities.[14]

Objectives

Methods

Results