Abstract

SCOPe (Structural Classification of Proteins—extended, http://scop.berkeley.edu) is a database of relationships between protein structures that extends the Structural Classification of Proteins (SCOP) database. SCOP is an expert-curated ordering of domains from the majority of proteins of known structure in a hierarchy according to structural and evolutionary relationships. SCOPe classifies the majority of protein structures released since SCOP development concluded in 2009, using a combination of manual curation and highly precise automated tools, aiming to have the same accuracy as fully hand-curated SCOP releases. SCOPe also incorporates and updates the ASTRAL compendium, which provides several databases and tools to aid in the analysis of the sequences and structures of proteins classified in SCOPe. SCOPe continues high-quality manual classification of new superfamilies, a key feature of SCOP. Artifacts such as expression tags are now separated into their own class, in order to distinguish them from the homology-based annotations in the remainder of the SCOPe hierarchy. SCOPe 2.06 contains 77,439 Protein Data Bank entries, double the 38,221 structures classified in SCOP.

Highlights

  • Most proteins have structural similarities with other proteins and, in many of these cases, share a common evolutionary origin

  • We use a combination of manual curation and a rigorously validated software pipeline [5] to add new structures from the Protein Data Bank (PDB) [6,7], and we have developed software to identify errors in Structural Classification of Proteins (SCOP), which are corrected in new releases of SCOPe

  • Manual curation of superfamilies is a key feature of SCOPe, in which proteins with similar three-dimensional (3D) structure and no recognizable sequence similarity are divided into homologs and possible analogs at the superfamily level on the basis of the expert biological insight of human curators

Read more

Summary

Background

Most proteins have structural similarities with other proteins and, in many of these cases, share a common evolutionary origin. We use a combination of manual curation and a rigorously validated software pipeline [5] to add new structures from the Protein Data Bank (PDB) [6,7], and we have developed software to identify errors in SCOP, which are corrected in new releases of SCOPe. SCOPe is backward compatible with SCOP, providing the same parseable files and a history of changes between all stable SCOP and SCOPe releases. SCOPe 2.06 (February 2016) added a new class (Artifacts) outside of the main SCOPe hierarchy (i.e., the first seven classes) in order to record cloning artifacts, such as expression tags, that we could identify in the solved structures based on sequence data and metadata annotations Including such artifacts in the classified domains can result in spurious similarity between non-homologous sequences, so their removal from the main hierarchy results in more accurate representative ASTRAL subsets. With the current releases of SCOPe, we aim to best meet the inferred needs of SCOP users [11], focusing on a classification consistent with that developed over the past 22 years, while maintaining outstanding classification accuracy, and being as comprehensive as possible

Manual Curation
Example of a new superfamily
Example of superfamily merging
Example of domain splitting
Artifact Removal
Automated Classification Protocol
Findings
New Website
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call