Abstract

Most drug discovery programs today originate by selection of 'hit' molecules resulting from assays against large compound screening libraries. The chemical space in which these hits reside has implications for its biological activity in vivo and likelihood of progression to a drug candidate. We have created a database of commercially available screening compounds and natural products in order to analyse the drug- and lead-likeness of commercial screening compounds and compare them with i) orally administered drugs, ii) non-orally administered drugs, and iii) compounds with significant biological activity but unspecified or not yet determined route of administration from the public databases DrugBank and ChEMBL. The data set contained 15.5 million entries from 102 vendors, which resulted in just over 8 million unique chemical structures. We review these data for current drug/lead-likeness, then utilise substructure-based filters for promiscuity and unwanted groups, and finally compare chemical properties for structures within the different sub-sets. While the majority of the commercial compounds satisfy various drug-likeness rules, they show a larger molecular weight and higher hydrophobicity compared to orally available drugs, with generally higher aromaticity and lower solubility. This 'right shift' of chemical properties has also been found in the majority of the compounds with significant biological activity in ChEMBL, reflecting a common trend in current drug discovery, towards larger, more hydrophobic compounds and fewer drug-like compounds. In particular, successful drugs were found to possess much lower median logD values than those found for compound collections. In addition, commercial compounds show a quite narrow distribution in molecular weight, with a median absolute deviation of only 78 Da around a median of 387 Da. For high-throughput screening a highly stringent combination of several lead-likeness and substructure filters against unwanted groups could be applied, resulting in 2 million lead-like structures. For fragment based screening approaches the rule of three (Ro3) would select around 400,000 structures.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call