Abstract

The manual classification of protein domains is approaching its 20th anniversary. ECOD is our mixed manual-automatic domain classification. Over time, the types of proteins which require manual curation has changed. Depositions with complex multidomain and multichain arrangements are commonplace. Transmembrane domains are regularly classified. Repeatedly, domains which are initially believed to be novel are found to have homologous links to existing classified domains. Here we present a brief summary of recent manual curation efforts in ECOD generally combined with specific case studies of transmembrane and multidomain proteins wherein manual curation was useful for discovering new homologous relationships. We present a new taxonomy for the classification of ABC transporter transmembrane domains. We examine alternate topologies of the leucine-specific (LS) domain of Leucine tRNA-synthetase. Finally, we elaborate on a distant homologous links between two helical dimerization domains.

Highlights

  • The classification of protein structures deposited in the Protein Data Bank (PDB) increasingly involves complexes, transmembrane proteins, and multidomain proteins with non-globular internal repeats [1]

  • Since the release of ECOD in 2014, we have pursued a policy of weekly updates, using a combined automated pipeline and manual curation workflow

  • The authors considered Caprin-1 DD to have no structural similarity to existing structures, we found that it exhibits significant structural similarity to the previously published PAN3 DD structures

Read more

Summary

Introduction

The classification of protein structures deposited in the PDB increasingly involves complexes, transmembrane proteins, and multidomain proteins with non-globular internal repeats [1] This trend is partly due to the improvement of structural determination techniques using cryo-electron microscopy and of transmembrane proteins by X-ray crystallography [2, 3]. Through covariation-based structure prediction, there are likely few remaining soluble, globular, protein structures that are not predictable computationally [4, 5]. Those structures which are targeted for structural determination and which cannot be classified tend to be transmembrane and/or or large multidomain structures participating in a protein complex. The number of such unpredictable proteins is small, they can be expected to disproportionately be revealed as targets for manual curators in any knowledge-based structural protein classification.

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call