Abstract

BackgroundSequences and structures provide valuable complementary information on protein features and functions. However, it is not always straightforward for users to gather information concurrently from the sequence and structure levels. The UniProt knowledgebase (UniProtKB) strives to help users on this undertaking by providing complete cross-references to Protein Data Bank (PDB) as well as coherent feature annotation using available structural information. In this study, SSMap – a new UniProt-PDB residue-residue level mapping – was generated. The primary objective of this mapping is not only to facilitate the two tasks mentioned above, but also to palliate a number of shortcomings of existent mappings. SSMap is the first isoform sequence-specific mapping resource and is up-to-date for UniProtKB annotation tasks. The method employed by SSMap differs from the other mapping resources in that it stresses on the correct reconstruction of the PDB sequence from structures, and on the correct attribution of a UniProtKB entry to each PDB chain by using a series of post-processing steps.ResultsSSMap was compared to other existing mapping resources in terms of the correctness of the attribution of PDB chains to UniProtKB entries, and of the quality of the pairwise alignments supporting the residue-residue mapping. It was found that SSMap shared about 80% of the mappings with other mapping sources. New and alternative mappings proposed by SSMap were mostly good as assessed by manual verification of data subsets. As for local pairwise alignments, it was shown that major discrepancies (both in terms of alignment lengths and boundaries), when present, were often due to differences in methodologies used for the mappings.ConclusionSSMap provides an independent, good quality UniProt-PDB mapping. The systematic comparison conducted in this study allows the further identification of general problems in UniProt-PDB mappings so that both the coverage and the quality of the mappings can be systematically improved for the benefit of the scientific community. SSMap mapping is currently used to provide PDB cross-references in UniProtKB.

Highlights

  • Sequences and structures provide valuable complementary information on protein features and functions

  • Information on a protein's primary sequence is comprehensively stored in the UniProt Knowledgebase (UniProtKB), which consists of the automatically annotated UniProtKB/TrEMBL section and the manually annotated UniProtKB/Swiss-Prot section [4]. 3D structures, on the other hand, are archived in the Protein Data Bank (PDB) [5]

  • Users are currently facing two challenges to gather useful information on a protein both from the sequence and structure levels. They have to identify the exact protein structure which corresponds to their protein of interest; and second, they have to know the correspondence between residues on a protein chain in the PDB file and those on the UniProtKB primary sequence

Read more

Summary

Introduction

Sequences and structures provide valuable complementary information on protein features and functions. Users are currently facing two challenges to gather useful information on a protein both from the sequence and structure levels They have to identify the exact protein structure which corresponds to their protein of interest; and second, they have to know the correspondence between residues on a protein chain in the PDB file and those on the UniProtKB primary sequence. It is not at all a trivial task to provide an accurate UniProtPDB mapping down to the residue level

Objectives
Methods
Results
Discussion
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.