Abstract
BackgroundProtein secondary structure prediction method based on probabilistic models such as hidden Markov model (HMM) appeals to many because it provides meaningful information relevant to sequence-structure relationship. However, at present, the prediction accuracy of pure HMM-type methods is much lower than that of machine learning-based methods such as neural networks (NN) or support vector machines (SVM).ResultsIn this paper, we report a new method of probabilistic nature for protein secondary structure prediction, based on dynamic Bayesian networks (DBN). The new method models the PSI-BLAST profile of a protein sequence using a multivariate Gaussian distribution, and simultaneously takes into account the dependency between the profile and secondary structure and the dependency between profiles of neighboring residues. In addition, a segment length distribution is introduced for each secondary structure state. Tests show that the DBN method has made a significant improvement in the accuracy compared to other pure HMM-type methods. Further improvement is achieved by combining the DBN with an NN, a method called DBNN, which shows better Q3 accuracy than many popular methods and is competitive to the current state-of-the-arts. The most interesting feature of DBN/DBNN is that a significant improvement in the prediction accuracy is achieved when combined with other methods by a simple consensus.ConclusionThe DBN method using a Gaussian distribution for the PSI-BLAST profile and a high-ordered dependency between profiles of neighboring residues produces significantly better prediction accuracy than other HMM-type probabilistic methods. Owing to their different nature, the DBN and NN combine to form a more accurate method DBNN. Future improvement may be achieved by combining DBNN with a method of SVM type.
Highlights
Protein secondary structure prediction method based on probabilistic models such as hidden Markov model (HMM) appeals to many because it provides meaningful information relevant to sequence-structure relationship
Further improvement is achieved by combining the dynamic Bayesian networks (DBN) with an neural networks (NN), a method named DBNN, which has achieved better Q3 accuracy than many other popular methods and is competitive to the current state-of-the-arts
The most interesting feature of DBN/ DBNN is that a significant improvement in the prediction accuracy is achieved when combined with other methods by a simple consensus
Summary
Protein secondary structure prediction method based on probabilistic models such as hidden Markov model (HMM) appeals to many because it provides meaningful information relevant to sequence-structure relationship. The prediction accuracy of protein secondary structure has gained some improvements, largely due to the successful application of machine learning tools such as neural network (NN) and support vector machine (SVM). Rost and Sander introduced the alignment profile with multiple sequence alignment into the prediction. Their method, named as PHD, performed much better than previous ones, because of the use of (page number not for citation purposes). SVM-based methods were developed for protein secondary structure prediction, first taking the alignment profile as inputs and being improved to use the PSIBLAST profile [8,9,10,11,12]. The Q3 of a modern NN or SVM-based method can reach over 76%
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.