Abstract

Shape string is structural sequence and is an extremely important structure representation of protein backbone conformations. Nuclear magnetic resonance chemical shifts give a strong correlation with the local protein structure, and are exploited to predict protein structures in conjunction with computational approaches. Here we demonstrate a novel approach, NMRDSP, which can accurately predict the protein shape string based on nuclear magnetic resonance chemical shifts and structural profiles obtained from sequence data. The NMRDSP uses six chemical shifts (HA, H, N, CA, CB and C) and eight elements of structure profiles as features, a non-redundant set (1,003 entries) as the training set, and a conditional random field as a classification algorithm. For an independent testing set (203 entries), we achieved an accuracy of 75.8% for S8 (the eight states accuracy) and 87.8% for S3 (the three states accuracy). This is higher than only using chemical shifts or sequence data, and confirms that the chemical shift and the structure profile are significant features for shape string prediction and their combination prominently improves the accuracy of the predictor. We have constructed the NMRDSP web server and believe it could be employed to provide a solid platform to predict other protein structures and functions. The NMRDSP web server is freely available at http://cal.tongji.edu.cn/NMRDSP/index.jsp.

Highlights

  • Nuclear Magnetic Resonance (NMR) is a well-established technique that allows the determination of three-dimensional biological macromolecule structures in solution

  • We demonstrate a new approach, NMRDSP, which is an extension of DSP and can more accurately predict the protein shape string based on NMR chemical shifts (CSs) and structural profiles obtained from sequence data

  • Data sets of chemical shifts and protein shape strings All of the NMR CS data used in the NMRDSP were retrieved from the Biological Magnetic Resonance Bank (BMRB) database [23] as of 2013

Read more

Summary

Introduction

Nuclear Magnetic Resonance (NMR) is a well-established technique that allows the determination of three-dimensional biological macromolecule structures in solution. Raman et al showed that structures could be accurately determined by incorporating backbone CS, residual dipolar couplings, and amide proton distances into the Rosetta protein structure modeling methodology [4]. In these studies, NMR CS was used indirectly as structural restraints to reduce the search spaces. Many studies have demonstrated that an accurate prediction of protein secondary structures could utilize NMR CSs and sequence data. We demonstrate a new approach, NMRDSP, which is an extension of DSP and can more accurately predict the protein shape string based on NMR CSs and structural profiles obtained from sequence data. The results confirm that the NMR CS and the structural profile are the significant features required for the prediction of the shape string and the combination of both of them significantly improves the accuracy of the predictor

Materials and Methods
Results and Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call