Abstract

BackgroundRecently we surveyed the dark-proteome, i.e., regions of proteins never observed by experimental structure determination and inaccessible to homology modelling. Surprisingly, we found that most of the dark proteome could not be accounted for by conventional explanations (e.g., intrinsic disorder, transmembrane domains, and compositional bias), and that nearly half of the dark proteome comprised dark proteins, in which the entire sequence lacked similarity to any known structure. In this paper we will present the Dark Proteome Database (DPD) and associated web services that provide access to updated information about the dark proteome.ResultsWe assembled DPD from several external web resources (primarily Aquaria and Swiss-Prot) and stored it in a relational database currently containing ~10 million entries and occupying ~2 GBytes of disk space. This database comprises two key tables: one giving information on the ‘darkness’ of each protein, and a second table that breaks each protein into dark and non-dark regions. In addition, a second version of the database is created using also information from the Protein Model Portal (PMP) to determine darkness. To provide access to DPD, a web server has been implemented giving access to all underlying data, as well as providing access to functional analyses derived from these data.ConclusionsAvailability of this database and its web service will help focus future structural and computational biology efforts to study the dark proteome, thus providing a basis for understanding a wide variety of biological functions that currently remain unknown.Availability and implementationDPD is available at http://darkproteome.ws. The complete database is also available upon request. Data use is permitted via the Creative Commons Attribution-NonCommercial International license (http://creativecommons.org/licenses/by-nc/4.0/).

Highlights

  • We surveyed the dark-proteome, i.e., regions of proteins never observed by experimental structure determination and inaccessible to homology modelling

  • Availability and implementation: Dark Proteome Database (DPD) is available at http://darkproteome.ws

  • About half of all protein sequences are Perdigão et al BioData Mining (2017) 10:24 detectably similar to proteins with known 3D structure, and some threedimensional (3D) structural information can be inferred by homology modelling [3, 4]

Read more

Summary

Introduction

We surveyed the dark-proteome, i.e., regions of proteins never observed by experimental structure determination and inaccessible to homology modelling. Two key databases in protein biochemistry are UniProt [1], which records protein sequences, and the Protein Data Bank (PDB) [2], which records three-dimensional (3D) structural models derived from experiments such as X-ray crystallography Comparing these two databases in terms of number of entries, UniProt has more than 65 million sequences while PDB has only ~125,000 3D structures in 2017. Aquaria is derived by systematically comparing all PDB proteins against 546,000 SwissProt sequences [1], which essentially covers all well-described protein sequences across a wide range of organisms This comparison resulted in 46 million sequence-tostructure alignments on PSSH2 database [6] resulting in one matching structure, at least, for 87% of Swiss-Prot proteins and a median of 35 structures per protein, providing a depth of sequence-to-structure information currently not available from other resources in nowadays

Results
Discussion
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.