Abstract

The Structure Integration with Function, Taxonomy and Sequences resource (SIFTS; http://pdbe.org/sifts/) was established in 2002 and continues to operate as a collaboration between the Protein Data Bank in Europe (PDBe; http://pdbe.org) and the UniProt Knowledgebase (UniProtKB; http://uniprot.org). The resource is instrumental in the transfer of annotations between protein structure and protein sequence resources through provision of up-to-date residue-level mappings between entries from the PDB and from UniProtKB. SIFTS also incorporates residue-level annotations from other biological resources, currently comprising the NCBI taxonomy database, IntEnz, GO, Pfam, InterPro, SCOP, CATH, PubMed, Ensembl, Homologene and automatic Pfam domain assignments based on HMM profiles. The recently released implementation of SIFTS includes support for multiple cross-references for proteins in the PDB, allowing mappings to UniProtKB isoforms and UniRef90 cluster members. This development makes structure data in the PDB readily available to over 1.8 million UniProtKB accessions.

Highlights

  • The rapid evolution in genetic sequencing over the past decades is leading to an unprecedented growth in the number of protein sequences available in the UniProt Knowledgebase (UniProtKB, http://uniprot.org)––a universal resource for sequence and functional information pertaining to proteins [1]

  • From 2018, SIFTS is incorporated into the Protein Data Bank in Europe (PDBe) Knowledge Base resource (PDBe-KB; http://pdbe-kb.org)

  • This limitation was overcome in the most recent SIFTS infrastructure update by organising the Protein Data Bank (PDB)-UniProtKB cross-references into three categories: (i) mapping to a UniProt canonical protein sequence, unchanged compared to the previous implementation, (ii) mapping to all alternative isoforms of the canonical sequence and (iii) mapping to sequences in UniRef90 clusters

Read more

Summary

INTRODUCTION

The rapid evolution in genetic sequencing over the past decades is leading to an unprecedented growth in the number of protein sequences available in the UniProt Knowledgebase (UniProtKB, http://uniprot.org)––a universal resource for sequence and functional information pertaining to proteins [1] It currently contains over 500 000 manually annotated sequences (UniProtKB/Swiss-Prot) and over 120 million computationally annotated ones (UniProtKB/TrEMBL) despite a near 50% reduction of the size of the holdings in 2015 to remove high sequence redundancy. A number of resources utilise the structure data from the PDB to annotate protein sequences within related families and superfamilies of sequences [6] Both the PDBe and UniProtKB are core resources at the European Bioinformatics Institute (EMBL-EBI; http: //www.ebi.ac.uk) [7] and within the context of the ELIXIR infrastructure (http://elixir-europe.org) [8]. From 2018, SIFTS is incorporated into the PDBe Knowledge Base resource (PDBe-KB; http://pdbe-kb.org)

METHODOLOGY
Findings
CONCLUSION
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call