Abstract

Many proteins function by interacting with other small molecules (ligands). Identification of ligand-binding sites (LBS) in proteins can therefore help to infer their molecular functions. A comprehensive comparison among local structures of LBSs was previously performed, in order to understand their relationships and to classify their structural motifs. However, similar exhaustive comparison among local surfaces of LBSs (patches) has never been performed, due to computational complexity. To enhance our understanding of LBSs, it is worth performing such comparisons among patches and classifying them based on similarities of their surface configurations and electrostatic potentials. In this study, we first developed a rapid method to compare two patches. We then clustered patches corresponding to the same PDB chemical component identifier for a ligand, and selected a representative patch from each cluster. We subsequently exhaustively as compared the representative patches and clustered them using similarity score, PatSim. Finally, the resultant PatSim scores were compared with similarities of atomic structures of the LBSs and those of the ligand-binding protein sequences and functions. Consequently, we classified the patches into ∼2000 well-characterized clusters. We found that about 63% of these clusters are used in identical protein folds, although about 25% of the clusters are conserved in distantly related proteins and even in proteins with cross-fold similarity. Furthermore, we showed that patches with higher PatSim score have potential to be involved in similar biological processes.

Highlights

  • Despite the rapid growth of genetic information and the number of protein structures deposited in the Protein Data Bank (PDB),[1] functions of many proteins are not clearly understood

  • Hierarchical clustering of patches in each chemical identifier The atomic coordinates of 148,199 ligands for 8283 ligand IDs (4794 out of these had more than two atomic coordinates) were obtained from 33,053 PDB entries (PDB ID) with molecular surfaces available at eF-site.[34]

  • 141,194 patches, each constructed of >20 vertices, for 7851 ligand IDs were obtained from 31,950 PDB IDs (4538 out of these had more than two patches) (Supporting Information Table SII)

Read more

Summary

Introduction

Despite the rapid growth of genetic information and the number of protein structures deposited in the Protein Data Bank (PDB),[1] functions of many proteins are not clearly understood. The functions of such proteins have often been assigned based on the analogy to their homologs with known functions, because proteins with highly similar sequences and structures tend to be evolutionarily related and have similar functions.[2,3,4,5,6,7] homologs proteins are not always available, and some proteins with similar sequences but dissimilar structures exist due to Abbreviations: EPot, electrostatic potentials; LBS, ligand-binding site; Patch, local molecular surface of LBS; PatSim, patch similarity; PDB, Protein Data Bank; psize, patch area size; SeqSim, sequence similarity; UFK, UniProt functional keyword.

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call