ToPS: A Framework to Manipulate Probabilistic Models of Sequence Data

André Yoshiaki Kashiwabara,Alan Mitchell Durham,Rafael Mathias,Ígor Bonadio,Vitor Onuchic,Felipe Amado

doi:10.1371/journal.pcbi.1003234

André Yoshiaki Kashiwabara, Alan Mitchell Durham + Show 4 more

Open Access

https://doi.org/10.1371/journal.pcbi.1003234

Copy DOI

Abstract

Discrete Markovian models can be used to characterize patterns in sequences of values and have many applications in biological sequence analysis, including gene prediction, CpG island detection, alignment, and protein profiling. We present ToPS, a computational framework that can be used to implement different applications in bioinformatics analysis by combining eight kinds of models: (i) independent and identically distributed process; (ii) variable-length Markov chain; (iii) inhomogeneous Markov chain; (iv) hidden Markov model; (v) profile hidden Markov model; (vi) pair hidden Markov model; (vii) generalized hidden Markov model; and (viii) similarity based sequence weighting. The framework includes functionality for training, simulation and decoding of the models. Additionally, it provides two methods to help parameter setting: Akaike and Bayesian information criteria (AIC and BIC). The models can be used stand-alone, combined in Bayesian classifiers, or included in more complex, multi-model, probabilistic architectures using GHMMs. In particular the framework provides a novel, flexible, implementation of decoding in GHMMs that detects when the architecture can be traversed efficiently.

Highlights

Markov models of nucleic acids and proteins are widely used in bioinformatics
Another example is the Variable Length Markov Chain in which the user has to set a parameter that controls the pruning of the probabilistic suffix tree
Scripts, configuration files and sequence data to reproduce the experiments are available through the ToPS homepage

Summary

Introduction

Examples of applications include ab initio gene prediction [1], CpG island detection [2], protein family characterization [3], and sequence alignment [4] Many times these models are hard coded in the analysis software, which means wellknown algorithms are implemented over and over again. A system providing a wide range of these models is important to allow researchers to quickly select the most appropriate model to analyze sequences of different problem domains. In some cases, such as gene prediction, the characterization of the family of sequences may involve using various probabilistic models integrated in a single architecture. Another alternative is a general-purpose system that can implement different models such as gHMM [8], HTK [9], HMMoC [10] and HMMConverter [11], N-SCAN [12] and Tigrscan [13] ( known as Genezilla)

Methods

Results

Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: PLoS Computational Biology	Publication Date: Oct 3, 2013
Citations: 7	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

ToPS: A Framework to Manipulate Probabilistic Models of Sequence Data

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: PLoS Computational Biology

Lead the way for us

Similar Papers

Hidden Markov Model ( HMM , Hidden Semi‐Markov Models, Profile Hidden Markov Models, Training of Hidden Markov Models, Dynamic Programming, Pair Hidden Markov Models)
Irmtraud Meyer
-
Irmtraud MeyerIrmtraud Meyer
15 Oct 2004
15 Oct 2004

Profile HMM based Multiple Sequence Alignment for DNA Sequences
Sudipta Mulia ... Tanushree Jena
Procedia Engineering | VOL. 38
Sudipta Mulia, et. al.Sudipta Mulia ... Tanushree Jena
01 Jan 2012
Procedia Engineering | VOL. 38

Fuzzy Neutrosophic Soft Set Based Transfer-Q-Learning Scheme for Load Balancing in Uncertain Grid Computing Environments
K Bhargavi ... Sajjan G. Shiva
Cybernetics and Information Technologies | VOL. 22
K Bhargavi, et. al.K Bhargavi ... Sajjan G. Shiva
01 Nov 2022
Cybernetics and Information Technologies | VOL. 22

Efam: an expanded, metaproteome-supported HMM profile database of viral protein families.
Ahmed A Zayed ... Ann C Gregory
Bioinformatics | VOL. 37
Ahmed A Zayed, et. al.Ahmed A Zayed ... Ann C Gregory
16 Jun 2021
Bioinformatics | VOL. 37

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

ToPS: A Framework to Manipulate Probabilistic Models of Sequence Data

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: PLoS Computational Biology