Abstract

The Protein Data Bank (PDB) is the single worldwide archive of experimentally-determined three-dimensional (3D) structures of proteins and nucleic acids. As of January 2017, the PDB housed more than 125,000 structures and was growing by more than 11,000 structures annually. Since the 3D structure of a protein is vital to understand the mechanisms of biological processes, diseases, and drug design, correct oligomeric assembly information is of critical importance. Unfortunately, the biologically relevant oligomeric form of a 3D structure is not directly obtainable by X-ray crystallography, whilst in solution methods (NMR or single particle EM) it is known from the experiment. Instead, this information may be provided by the PDB Depositor as metadata coming from additional experiments, be inferred by sequence-sequence comparisons with similar proteins of known oligomeric state, or predicted using software, such as PISA (Proteins, Interfaces, Structures and Assemblies) or EPPIC (Evolutionary Protein Protein Interface Classifier). Despite significant efforts by professional PDB Biocurators during data deposition, there remain a number of structures in the archive with incorrect quaternary structure descriptions (or annotations). Further investigation is, therefore, needed to evaluate the correctness of quaternary structure annotations. In this study, we aim to identify the most probable oligomeric states for proteins represented in the PDB. Our approach evaluated the performance of four independent prediction methods, including text mining of primary publications, inference from homologous protein structures, and two computational methods (PISA and EPPIC). Aggregating predictions to give consensus results outperformed all four of the independent prediction methods, yielding 83% correct, 9% wrong, and 8% inconclusive predictions, when tested with a well-curated benchmark dataset. We have developed a freely-available web-based tool to make this approach accessible to researchers and PDB Biocurators (http://quatstruct.rcsb.org/).

Highlights

  • The Protein Data Bank (PDB, pdb.org) [1] provides detailed information about the threedimensional (3D) structures of biological macromolecules, including proteins and nucleic acids

  • During the course of this effort, we developed an efficient approach to evaluate oligomeric states of protein structures in the PDB, which we have made freely available to both PDB Biocurators and researcher as a web-based tool

  • The 3D structures of macromolecules are vital for drug design and development studies, especially in structure-based drug design

Read more

Summary

Introduction

The Protein Data Bank (PDB, pdb.org) [1] provides detailed information about the threedimensional (3D) structures of biological macromolecules, including proteins and nucleic acids. The majority (~90%) PDB structures were determined by Xray crystallography. This experimental method yields 3D atomic level structures of the socalled asymmetric unit (Fig 1A), which is the repeating unit that makes up the crystal (Fig 1B). Knowledge of the 3D structure of the asymmetric unit and intermolecular interactions among asymmetric units does not provide sufficient information to reveal conclusively the oligomeric structures of protein assemblies, because is often not possible to distinguish biologically relevant intermolecular contacts from contacts that merely stabilize the crystal lattice

Objectives
Methods
Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call