Analysis of pattern overlaps and exact computation of P-values of pattern occurrences numbers: case of Hidden Markov Models.

Mireille Régnier,Mikhail Roytberg,Victor Yakovlev,Evgenia Furletova

doi:10.1186/s13015-014-0025-1

Abstract

BackgroundFinding new functional fragments in biological sequences is a challenging problem. Methods addressing this problem commonly search for clusters of pattern occurrences that are statistically significant. A measure of statistical significance is the P-value of a number of pattern occurrences, i.e. the probability to find at least S occurrences of words from a pattern in a random text of length N generated according to a given probability model. All words of the pattern are supposed to be of same length.ResultsWe present a novel algorithm SufPref that computes an exact P-value for Hidden Markov models (HMM). The algorithm is based on recursive equations on text sets related to pattern occurrences; the equations can be used for any probability model. The algorithm inductively traverses a specific data structure, an overlap graph. The nodes of the graph are associated with the overlaps of words from . The edges are associated to the prefix and suffix relations between overlaps. An originality of our data structure is that pattern need not be explicitly represented in nodes or leaves. The algorithm relies on the Cartesian product of the overlap graph and the graph of HMM states; this approach is analogous to the automaton approach from JBCB 4: 553-569. The gain in size of SufPref data structure leads to significant improvements in space and time complexity compared to existent algorithms. The algorithm SufPref was implemented as a C++ program; the program can be used both as Web-server and a stand alone program for Linux and Windows. The program interface admits special formats to describe probability models of various types (HMM, Bernoulli, Markov); a pattern can be described with a list of words, a PSSM, a degenerate pattern or a word and a number of mismatches. It is available at http://server2.lpm.org.ru/bio/online/sf/. The program was applied to compare sensitivity and specificity of methods for TFBS prediction based on P-values computed for Bernoulli models, Markov models of orders one and two and HMMs. The experiments show that the methods have approximately the same qualities.Electronic supplementary materialThe online version of this article (doi:10.1186/s13015-014-0025-1) contains supplementary material, which is available to authorized users.

Highlights

Finding new functional fragments in biological sequences is a challenging problem
Hidden Markov models (HMM) were considered in only a few papers [5,6] despite the models being widely used in bioinformatics
This work presents an approach to compute the P-value of multiple pattern occurrence within a randomly generated text of a given length

Summary

Introduction

Finding new functional fragments in biological sequences is a challenging problem Methods addressing this problem commonly search for clusters of pattern occurrences that are statistically significant. Many functionally significant fragments are characterized by a set of specific words that is called a pattern and denoted H below. Hidden Markov models (HMM) were considered in only a few papers [5,6] despite the models being widely used in bioinformatics. This is a motivation to develop methods for P-value calculation with respect to HMMs

Objectives

Methods

Results

Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Algorithms for molecular biology : AMB	Publication Date: Dec 1, 2014
Citations: 31	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

Analysis of pattern overlaps and exact computation of P-values of pattern occurrences numbers: case of Hidden Markov Models.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Algorithms for molecular biology : AMB

Lead the way for us

Similar Papers

Hiddenness Control of Hidden Markov Models and Application to Objective Speech Quality and Isolated-Word Speech Recognition
Gaurav Talwar ... Hongkang Liang
-
Gaurav Talwar, et. al.Gaurav Talwar ... Hongkang Liang
01 Jan 2006
01 Jan 2006

Hybrid modeling, HMM/NN architectures, and protein applications.
Pierre Baldi ... Yves Chauvin
Neural computation | VOL. 8
Pierre Baldi, et. al.Pierre Baldi ... Yves Chauvin
01 Oct 1996
Neural computation | VOL. 8

Zipf exponent of trajectory distribution in the hidden Markov model
V V Bochkarev ... E Yu Lerner
Journal of Physics: Conference Series | VOL. 490
V V Bochkarev, et. al.V V Bochkarev ... E Yu Lerner
11 Mar 2014
Journal of Physics: Conference Series | VOL. 490

Modelling state-transition dynamics in resting-state brain signals by the hidden Markov and Gaussian mixture models.
Takahiro Ezaki ... Takamitsu Watanabe
European Journal of Neuroscience | VOL. 54
Takahiro Ezaki, et. al.Takahiro Ezaki ... Takamitsu Watanabe
22 Jul 2021
European Journal of Neuroscience | VOL. 54

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Analysis of pattern overlaps and exact computation of P-values of pattern occurrences numbers: case of Hidden Markov Models.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Algorithms for molecular biology : AMB