Abstract

One of the important tools in analyzing and modeling biological data is the Hidden Markov Model (HMM), which is used for gene prediction, protein secondary structure and other essential tasks. An HMM is a stochastic process in which a hidden Markov chain called; the chain of states, emits a sequence of observations. Using this sequence, various questions about the underlying emission generation scheme can be addressed. Applying an HMM to any particular situation is an attempt to infer which state in the chain emits an observation. This is usually called posterior decoding. In general, the emissions are assumed to be conditionally independent from each other. In this work we consider some dependencies among the states and emissions. The aim of our research is to study a certain relationship among emissions, with a focus on the Markov property. We assume that the probability of observing an emission depends not only on the current state but also on the previous state and one of the previous emissions. We also use additional environmental information, and classify amino acids into three groups, using the Relative Solvent Accessibility (RSA). We also investigate how this modification might change the current algorithms for ordinary HMMs, and introduce modified Viterbi and Forward-Backward algorithms for the new model. We apply our proposed model to an actual dataset concerning prediction of the protein secondary structure and demonstrate improved accuracy compared to the ordinary HMM. In particular, the overall accuracy of our modified HMM, which uses the RSA information, is 63.95%. This is 5.9% higher than the prediction accuracy realized by using an ordinary HMM on the same dataset, and 4% higher than the corresponding prediction accuracy of a modified HMM that simply accounts for the dependencies among the emissions.

Highlights

  • A hidden Markov model (HMM) is a statistical tool that is used to model a stochastic sequence

  • Karplus et al [17] introduced a new HMM based server for protein structure prediction. He provided a large number of intermediate results, which are often interesting in their own right: multiple sequence alignments (MSAs) of putative homologs, prediction of local structure features, lists of potential templates of known structure, alignments to templates and residue-residue contact

  • Where, NH, NS, and NC represent the number of correctly predicted H, S and C state, respectively, and N is the total number of amino acids

Read more

Summary

Introduction

A hidden Markov model (HMM) is a statistical tool that is used to model a stochastic sequence. It corresponds to a Markov chain such that every state in the chain emits observations according to a density function. Using an HMM an observed sequence is modeled as the output of a discrete stochastic process, which is hidden. HMMs are widely used in biological sequence analysis and bioinformatics They are used in protein structure prediction studies. Won et al [9,10] applied a new method for optimizing the topology of an HMM for the secondary structure prediction using genetic algorithms. Karplus et al [17] introduced a new HMM based server for protein structure prediction. He provided a large number of intermediate results, which are often interesting in their own right: multiple sequence alignments (MSAs) of putative homologs, prediction of local structure features, lists of potential templates of known structure, alignments to templates and residue-residue contact

Objectives
Methods
Results
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.