Learning sparse models for a dynamic Bayesian network classifier of protein secondary structure.

Zafer Aydin,Ajit Singh,Jeff Bilmes,William S Noble

doi:10.1186/1471-2105-12-154

Zafer Aydin, Ajit Singh + Show 2 more

Open Access

https://doi.org/10.1186/1471-2105-12-154

Copy DOI

Abstract

BackgroundProtein secondary structure prediction provides insight into protein function and is a valuable preliminary step for predicting the 3D structure of a protein. Dynamic Bayesian networks (DBNs) and support vector machines (SVMs) have been shown to provide state-of-the-art performance in secondary structure prediction. As the size of the protein database grows, it becomes feasible to use a richer model in an effort to capture subtle correlations among the amino acids and the predicted labels. In this context, it is beneficial to derive sparse models that discourage over-fitting and provide biological insight.ResultsIn this paper, we first show that we are able to obtain accurate secondary structure predictions. Our per-residue accuracy on a well established and difficult benchmark (CB513) is 80.3%, which is comparable to the state-of-the-art evaluated on this dataset. We then introduce an algorithm for sparsifying the parameters of a DBN. Using this algorithm, we can automatically remove up to 70-95% of the parameters of a DBN while maintaining the same level of predictive accuracy on the SD576 set. At 90% sparsity, we are able to compute predictions three times faster than a fully dense model evaluated on the SD576 set. We also demonstrate, using simulated data, that the algorithm is able to recover true sparse structures with high accuracy, and using real data, that the sparse model identifies known correlation structure (local and non-local) related to different classes of secondary structure elements.ConclusionsWe present a secondary structure prediction method that employs dynamic Bayesian networks and support vector machines. We also introduce an algorithm for sparsifying the parameters of the dynamic Bayesian network. The sparsification approach yields a significant speed-up in generating predictions, and we demonstrate that the amino acid correlations identified by the algorithm correspond to several known features of protein secondary structure. Datasets and source code used in this study are available at http://noble.gs.washington.edu/proj/pssp.

Highlights

Protein secondary structure prediction provides insight into protein function and is a valuable preliminary step for predicting the 3D structure of a protein
The results of the seven-fold cross-validation are summarized in Table 1, including the amino acid level accuracy, the segment overlap score (SOV) [25], and Matthew’s correlation coefficients (MCC) [26]
Recovery of true sparse model structures We have demonstrated that the sparse learning procedure proposed in the “Learning a sparse model for a Dynamic Bayesian networks (DBNs)” section yields a model that provides highly accurate predictions

Summary

Introduction

Protein secondary structure prediction provides insight into protein function and is a valuable preliminary step for predicting the 3D structure of a protein. As the size of the protein database grows, it becomes feasible to use a richer model in an effort to capture subtle correlations among the amino acids and the predicted labels. In this context, it is beneficial to derive sparse models that discourage over-fitting and provide biological insight. Protein secondary structure provides a useful intermediate representation between the primary amino acid sequence and the full three-dimensional structure. The earliest method for secondary structure prediction [1] used a neural network to achieve a base-level predictive accuracy of 64.3% from a dataset of 106 labeled proteins. State-of-the-art methods achieve accuracies in the range of 77-80% on a variety of published benchmark datasets [4]

Objectives

Methods

Results

Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: BMC Bioinformatics	Publication Date: May 13, 2011
Citations: 83	License type: cc-by

R Discovery Prime

R Discovery Prime

Learning sparse models for a dynamic Bayesian network classifier of protein secondary structure.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: BMC Bioinformatics

Lead the way for us

Similar Papers

Prediction of secondary structural content of proteins from their amino acid composition alone. II. The paradox with secondary structural class.
Frank Eisenhaber ... Cornelius Frömmel
Proteins | VOL. 25
Frank Eisenhaber, et. al.Frank Eisenhaber ... Cornelius Frömmel
01 Jun 1996
Proteins | VOL. 25

A dynamic Bayesian network approach to protein secondary structure prediction
Xin-Qiu Yao ... Huaiqiu Zhu
BMC Bioinformatics | VOL. 9
Xin-Qiu Yao, et. al.Xin-Qiu Yao ... Huaiqiu Zhu
25 Jan 2008
BMC Bioinformatics | VOL. 9

Secondary and Tertiary Structure Prediction of Proteins: A Bioinformatic Approach
Minu Kesheri ... Rajeshwar Prasad Sinha
-
Minu Kesheri, et. al.Minu Kesheri ... Rajeshwar Prasad Sinha
30 Nov 2014
30 Nov 2014

Analyzing the Interplay Between Secondary and Tertiary Structure Predictions in Folding Simulations with a Genetic Algorithm
Thomas Dandekar ... Fuli Du
Journal of Molecular Modeling | VOL. 5
Thomas Dandekar, et. al.Thomas Dandekar ... Fuli Du
01 Apr 1999
Journal of Molecular Modeling | VOL. 5

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Learning sparse models for a dynamic Bayesian network classifier of protein secondary structure.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: BMC Bioinformatics