MUFOLD-DB: a processed protein structure database for protein structure prediction and analysis.

Zhiquan He,Jingfen Zhang,Shuai Zeng,Dong Xu,Chao Zhang,Yang Xu

doi:10.1186/1471-2164-15-s11-s2

Abstract

BackgroundProtein structure data in Protein Data Bank (PDB) are widely used in studies of protein function and evolution and in protein structure prediction. However, there are two main barriers in large-scale usage of PDB data: 1) PDB data are highly redundant in terms of sequence and structure similarity; and 2) many PDB files have issues due to inconsistency of data and standards as well as missing residues, so that automated retrieval and analysis are often difficult.DescriptionTo address these issues, we have created MUFOLD-DB http://mufold.org/mufolddb.php, a web-based database, to collect and process the weekly PDB files thereby providing users with non-redundant, cleaned and partially-predicted structure data. For each of the non-redundant sequences, we annotate the SCOP domain classification and predict structures of missing regions by loop modelling. In addition, evolutional information, secondary structure, disorder region, and processed three-dimensional structure are computed and visualized to help users better understand the protein.ConclusionsMUFOLD-DB integrates processed PDB sequence and structure data and multiple computational results, provides a friendly interface for users to retrieve, browse and download these data, and offers several useful functionalities to facilitate users' data operation.

Highlights

Protein structure data in Protein Data Bank (PDB) [1] are widely used in studies of protein function and evolution, and they serve as a basis for protein structure prediction
The second barrier in large-scale usage of PDB data is that many PDB files have issues due to inconsistency of data and standards as well as missing residues, so that automated retrieval and analysis are often difficult
We introduce MUFOLD-DB which comprehensively integrates processed PDB data, predicted SCOP classification and additional computational data, e.g. DSSP [7] secondary structure and PSI-BLAST [8] sequence profile

Summary

Conclusions

As of January 13, 2014, the data of MUFOLD-DB are summarized in Tables 2 and 3. The data cover 2863 SCOP families, 1551 super-families and 960 folds. MUFOLD-DB will be continuously maintained and updated. Part of the future work will be better handling of residue modification and missing coordinates. Availability The database is publicly available and can be accessed at http://mufold.org/mufolddb.php. Authors’ contributions The MUFOLD-DB is a joint work of the authors as a team. ZH wrote all the scripts and tools to download, process the data and create tables for database. The database and web server were designed and built by CZ and ZH. YX developed and implemented the protocol for SCOP classification and performance analysis. JZ and DX provided guidance and critical design decision during the development. All authors contributed to the analyses and discussions.

Background

Check unassigned regions

Findings

10. Soding J

18. Kraulis PJ

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: BMC Genomics	Publication Date: Dec 1, 2014
Citations: 29	License type: cc-by

R Discovery Prime

R Discovery Prime

MUFOLD-DB: a processed protein structure database for protein structure prediction and analysis.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: BMC Genomics

Lead the way for us

Similar Papers

PDB Structure Data Impacted Discovery and Development of Recently FDA‐Approved Drugs
Christine Zardecki ... Stephen K Burley
The FASEB Journal | VOL. 33
Christine Zardecki, et. al.Christine Zardecki ... Stephen K Burley
01 Apr 2019
The FASEB Journal | VOL. 33

Rcsb Protein Data Bank: Sustaining a Living Digital Data Resource that Enables Breakthroughs in Scientific Research and Biomedical Education
Stephen K Burley
Biophysical Journal | VOL. 116
Stephen K BurleyStephen K Burley
01 Feb 2019
Biophysical Journal | VOL. 116

RCSB Protein Data Bank: Celebrating 50 years of the PDB with new tools for understanding and visualizing biological macromolecules in 3D.
...
Protein Science | VOL. 31
, et. al. ...
06 Nov 2021
Protein Science | VOL. 31

Growth of novel protein structural data
Michael Levitt
Proceedings of the National Academy of Sciences | VOL. 104
Michael LevittMichael Levitt
27 Feb 2007
Proceedings of the National Academy of Sciences | VOL. 104

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

MUFOLD-DB: a processed protein structure database for protein structure prediction and analysis.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: BMC Genomics