Abstract
BackgroundSecondary structure prediction is a useful first step toward 3D structure prediction. A number of successful secondary structure prediction methods use neural networks, but unfortunately, neural networks are not intuitively interpretable. On the contrary, hidden Markov models are graphical interpretable models. Moreover, they have been successfully used in many bioinformatic applications. Because they offer a strong statistical background and allow model interpretation, we propose a method based on hidden Markov models.ResultsOur HMM is designed without prior knowledge. It is chosen within a collection of models of increasing size, using statistical and accuracy criteria. The resulting model has 36 hidden states: 15 that model α-helices, 12 that model coil and 9 that model β-strands. Connections between hidden states and state emission probabilities reflect the organization of protein structures into secondary structure segments. We start by analyzing the model features and see how it offers a new vision of local structures. We then use it for secondary structure prediction. Our model appears to be very efficient on single sequences, with a Q3 score of 68.8%, more than one point above PSIPRED prediction on single sequences. A straightforward extension of the method allows the use of multiple sequence alignments, rising the Q3 score to 75.5%.ConclusionThe hidden Markov model presented here achieves valuable prediction results using only a limited number of parameters. It provides an interpretable framework for protein secondary structure architecture. Furthermore, it can be used as a tool for generating protein sequences with a given secondary structure content.
Highlights
Secondary structure prediction is a useful first step toward 3D structure prediction
Hidden Markov model selection The optimal hidden Markov model for secondary structure prediction, referred as OSS-Hidden Markov Models (HMM) (Optimal Secondary Structure prediction Hidden Markov Model), was chosen using three criteria: the Q3 achieved in prediction, the Bayesian Information Criterion (BIC) value of the model and the statistical distance between models
The automatic generation of a HMM topology has been previously addressed by several groups [25,26,27,28,29]
Summary
Secondary structure prediction is a useful first step toward 3D structure prediction. Hidden Markov models are graphical interpretable models. They have been successfully used in many bioinformatic applications. Because they offer a strong statistical background and allow model interpretation, we propose a method based on hidden Markov models. Secondary structure prediction is used to refine sequence alignments, or to improve the detection of distant homologs [1]. It is of prime importance when prediction is made without a template [2]. A survey of the Eva on-line evaluation [3] shows that the top performing methods include several (page number not for citation purposes)
Published Version (Free)
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have