Abstract

As the protein databank (PDB) recently passed the cap of 123 456 structures, it stands more than ever as an important resource not only to analyze structural features of specific biological systems, but also to study the prevalence of structural patterns observed in a large body of unrelated structures, that may reflect rules governing protein folding or molecular recognition. Here, we compiled a list of 11 016 unique structures of small-molecule ligands bound to proteins - 6444 of which have experimental binding affinity - representing 750 873 protein-ligand atomic interactions, and analyzed the frequency, geometry and impact of each interaction type. We find that hydrophobic interactions are generally enriched in high-efficiency ligands, but polar interactions are over-represented in fragment inhibitors. While most observations extracted from the PDB will be familiar to seasoned medicinal chemists, less expected findings, such as the high number of C-H···O hydrogen bonds or the relatively frequent amide-π stacking between the backbone amide of proteins and aromatic rings of ligands, uncover underused ligand design strategies.

Highlights

  • Significant progress in high-throughput X-ray crystallography[1,2] combined with advances in structural genomics[3,4,5] have led to an explosion in the number of structures publicly available in the protein data bank (PDB).[6]

  • A statistical analysis of the nature, geometry and frequency of atomic interactions between small molecule ligands and their receptors in the PDB could inform the rational optimization of chemical series, help in the interpretation of difficult SAR, aid the development of protein–ligand interaction fingerprints, and serve as a knowledge-base for the improvement of scoring functions used in virtual screening

  • We find that efficient ligands are more hydrophobic, as the median number of heavy atoms and log D (ChemAxon) for compounds with high fit quality (FQ) are 27 and 1.7, respectively, and 21 and 0.2, respectively, for compounds with low FQ

Read more

Summary

Introduction

Significant progress in high-throughput X-ray crystallography[1,2] combined with advances in structural genomics[3,4,5] have led to an explosion in the number of structures publicly available in the protein data bank (PDB).[6]. This probably reflects the higher number of ligands containing carboxylic acids (1849) than ammonium groups (1103) in the PDB, as the frequency of arginine (5.6%) and lysine (5.0%) in proteins is similar to that observed for aspartic acid (5.4%) and glutamic acid (3.8%) (UniProtKB/TrEMBL UniProt release 2017_03).[64] Arginine was the cation in 83.6% of all interactions (Fig. S10†) This seems to be agreement with quantum mechanical calculations, which predict that arginine are more inclined than lysine side-chains to form salt bridges.[65] the distribution of negatively charged oxygens around the guanidinium group of arginine shows a higher density around the terminal (ω) nitrogens than at the secondary amine (ε) nitrogen (Fig. S10†). A similar trend was previously observed for peptidic interactions.[80] This preference has been attributed to the fact that the guanidinium group of arginines can donate several hydrogen bonds while simultaneously binding to an aromatic ring.[73] When the positive nitrogen came from the ligand, tyrosine side-chains were the most common partner with 156 interactions, followed by phenylalanine and tryptophan (59 and 24 interactions respectively) (Fig. S12†). This analysis will help in the interpretation of difficult SAR, and may serve as a knowledgebase for the improvement of scoring functions used in virtual screening

Conflicts of interest
Findings
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.