A linear memory algorithm for Baum-Welch training

István Miklós,Irmtraud M Meyer

doi:10.1186/1471-2105-6-231

Abstract

Background:Baum-Welch training is an expectation-maximisation algorithm for training the emission and transition probabilities of hidden Markov models in a fully automated way. It can be employed as long as a training set of annotated sequences is known, and provides a rigorous way to derive parameter values which are guaranteed to be at least locally optimal. For complex hidden Markov models such as pair hidden Markov models and very long training sequences, even the most efficient algorithms for Baum-Welch training are currently too memory-consuming. This has so far effectively prevented the automatic parameter training of hidden Markov models that are currently used for biological sequence analyses.Results:We introduce the first linear space algorithm for Baum-Welch training. For a hidden Markov model with M states, T free transition and E free emission parameters, and an input sequence of length L, our new algorithm requires O(M) memory and O(LMTmax (T + E)) time for one Baum-Welch iteration, where Tmax is the maximum number of states that any state is connected to. The most memory efficient algorithm until now was the checkpointing algorithm with O(log(L)M) memory and O(log(L)LMTmax) time requirement. Our novel algorithm thus renders the memory requirement completely independent of the length of the training sequences. More generally, for an n-hidden Markov model and n input sequences of length L, the memory requirement of O(log(L)Ln-1 M) is reduced to O(Ln-1 M) memory while the running time is changed from O(log(L)Ln MTmax + Ln(T + E)) to O(Ln MTmax (T + E)).An added advantage of our new algorithm is that a reduced time requirement can be traded for an increased memory requirement and vice versa, such that for any c ∈ {1, ..., (T + E)}, a time requirement of Ln MTmax c incurs a memory requirement of Ln-1 M(T + E - c).ConclusionFor the large class of hidden Markov models used for example in gene prediction, whose number of states does not scale with the length of the input sequence, our novel algorithm can thus be both faster and more memory-efficient than any of the existing algorithms.

Highlights

Baum-Welch training is an expectation-maximisation algorithm for training the emission and transition probabilities of hidden Markov models in a fully automated way
For the large class of hidden Markov models used for example in gene prediction, whose number of states does not scale with the length of the input sequence, our novel algorithm can be both faster and more memory-efficient than any of the existing algorithms
When an Hidden Markov Models (HMMs) consisting of M states is used to annotate an input sequence, its predictions crucially depend on its set of emission probabilities ε and transition probabilities

Summary

Introduction

Baum-Welch training is an expectation-maximisation algorithm for training the emission and transition probabilities of hidden Markov models in a fully automated way. It can be employed as long as a training set of annotated sequences is known, and provides a rigorous way to derive parameter values which are guaranteed to be at least locally optimal. When an HMM consisting of M states is used to annotate an input sequence, its predictions crucially depend on its set of emission probabilities ε and transition probabilities. It can be quite difficult to assign values to its emission probabilities ε and transition probabilities They should be set up such that the model's predictions would perfectly reproduce the known annotation of a large and diverse set of input sequences. Two main scenarios have to be distinguished [1]:

Methods

Results

Conclusion

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: BMC Bioinformatics	Publication Date: Sep 19, 2005
Citations: 40	License type: cc-by

R Discovery Prime

R Discovery Prime

A linear memory algorithm for Baum-Welch training

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: BMC Bioinformatics

Lead the way for us

Similar Papers

Optimizing reduced-space sequence analysis.
Raymond Wheeler ... Richard Hughey
Bioinformatics | VOL. 16
Raymond Wheeler, et. al.Raymond Wheeler ... Richard Hughey
01 Dec 2000
Bioinformatics | VOL. 16

Hidden Markov Model ( HMM , Hidden Semi‐Markov Models, Profile Hidden Markov Models, Training of Hidden Markov Models, Dynamic Programming, Pair Hidden Markov Models)
Irmtraud Meyer
-
Irmtraud MeyerIrmtraud Meyer
15 Oct 2004
15 Oct 2004

I-smooth for improved minimum classification error training
Haozheng Li ... Cosmin Munteanu
-
Haozheng Li, et. al.Haozheng Li ... Cosmin Munteanu
01 Jan 2009
01 Jan 2009

Optimisation of HMM topology and its model parameters by genetic algorithms
S Kwong ... K.S Tang
Pattern Recognition | VOL. 34
S Kwong, et. al.S Kwong ... K.S Tang
01 Feb 2001
Pattern Recognition | VOL. 34

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

A linear memory algorithm for Baum-Welch training

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: BMC Bioinformatics