Abstract

Structural information on the interactions of proteins with other molecules is plentiful, and for some proteins and protein families, there may be 100s of available structures. It can be very difficult for a scientist who is not trained in structural bioinformatics to access this information comprehensively. Previously, we developed the Protein Common Interface Database (ProtCID), which provided clusters of the interfaces of full-length protein chains as a means of identifying biological assemblies. Because proteins consist of domains that act as modular functional units, we have extended the analysis in ProtCID to the individual domain level. This has greatly increased the number of large protein-protein clusters in ProtCID, enabling the generation of hypotheses on the structures of biological assemblies of many systems. The analysis of domain families allows us to extend ProtCID to the interactions of domains with peptides, nucleic acids, and ligands. ProtCID provides complete annotations and coordinate sets for every cluster.

Highlights

  • Structural information on the interactions of proteins with other molecules is plentiful, and for some proteins and protein families, there may be 100s of available structures

  • We utilize our database called PDBfam[7] containing 8636 protein domain family identifiers (Pfams) observed within the Protein Data Bank (PDB)

  • Each PDB chain is annotated by a Pfam architecture as the ordered sequence of Pfams along the chain, e.g., (SH3)_(SH2)_(Pkinase)

Read more

Summary

Introduction

Structural information on the interactions of proteins with other molecules is plentiful, and for some proteins and protein families, there may be 100s of available structures. We have shown that if a homodimeric or heterodimeric interface is present in multiple crystal forms, especially when the proteins in the different crystals are homologous but not identical, such interfaces are very likely to be part of biologically relevant assemblies[4]. To enable this form of analysis, we previously developed PDBfam[7], which assigns protein domain families (as defined by Pfam8) to every protein sequence in the PDB, and the Protein Common Interface Database (ProtCID), which compares and clusters the interfaces of pairs of full-length protein chains with defined Pfam domain architectures in different entries in the PDB9. ProtCID provides coordinates and PyMol scripts for visualizing interfaces in each cluster

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call