ZipHMMlib: a highly optimised HMM library exploiting repetitions in the input to speed up the forward algorithm

Andreas Sand,Christian Ns Pedersen,Martin Kristiansen,Thomas Mailund

doi:10.1186/1471-2105-14-339

Abstract

BackgroundHidden Markov models are widely used for genome analysis as they combine ease of modelling with efficient analysis algorithms. Calculating the likelihood of a model using the forward algorithm has worst case time complexity linear in the length of the sequence and quadratic in the number of states in the model. For genome analysis, however, the length runs to millions or billions of observations, and when maximising the likelihood hundreds of evaluations are often needed. A time efficient forward algorithm is therefore a key ingredient in an efficient hidden Markov model library.ResultsWe have built a software library for efficiently computing the likelihood of a hidden Markov model. The library exploits commonly occurring substrings in the input to reuse computations in the forward algorithm. In a pre-processing step our library identifies common substrings and builds a structure over the computations in the forward algorithm which can be reused. This analysis can be saved between uses of the library and is independent of concrete hidden Markov models so one preprocessing can be used to run a number of different models.Using this library, we achieve up to 78 times shorter wall-clock time for realistic whole-genome analyses with a real and reasonably complex hidden Markov model. In one particular case the analysis was performed in less than 8 minutes compared to 9.6 hours for the previously fastest library.ConclusionsWe have implemented the preprocessing procedure and forward algorithm as a C++ library, zipHMM, with Python bindings for use in scripts. The library is available at http://birc.au.dk/software/ziphmm/.

Highlights

Hidden Markov models are widely used for genome analysis as they combine ease of modelling with efficient analysis algorithms
We have implemented the above algorithms in a C++ library named zipHMM
The code provides both a C++ and a Python interface to the functionality of reading and writing Hidden Markov models (HMMs) to files, preprocessing input sequences and saving the results, and computing the likelihood of a model using the forward algorithm described in the previous section

Summary

Results

We have built a software library for efficiently computing the likelihood of a hidden Markov model. In a pre-processing step our library identifies common substrings and builds a structure over the computations in the forward algorithm which can be reused. This analysis can be saved between uses of the library and is independent of concrete hidden Markov models so one preprocessing can be used to run a number of different models. Using this library, we achieve up to 78 times shorter wall-clock time for realistic whole-genome analyses with a real and reasonably complex hidden Markov model. In one particular case the analysis was performed in less than 8 minutes compared to 9.6 hours for the previously fastest library

Background

Results and discussion

Conclusions

Churchill GA

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: BMC Bioinformatics	Publication Date: Nov 22, 2013
Citations: 36	License type: CC BY 2.0

R Discovery Prime

R Discovery Prime

ZipHMMlib: a highly optimised HMM library exploiting repetitions in the input to speed up the forward algorithm

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: BMC Bioinformatics

Lead the way for us

Similar Papers

Classification of Hidden Markov Models: Obtaining Bounds on the Probability of Error and Dealing with Possibly Corrupted Observations
Eleftheria Athanasopoulou ... Christoforos N.
-
Eleftheria Athanasopoulou, et. al.Eleftheria Athanasopoulou ... Christoforos N.
19 Apr 2011
19 Apr 2011

Online Signature Verification Using Vector Quantization and Hidden Markov Model
...
-
, et. al. ...
09 May 2015
09 May 2015

Hidden Markov Models and Protein Secondary Structure Prediction
Xuhua Xia
-
Xuhua XiaXuhua Xia
01 Jan 2018
01 Jan 2018

Privacy-Preserving HMM Forward Computation
Jan Henrik Ziegeldorf ... Jan Rüth
-
Jan Henrik Ziegeldorf, et. al.Jan Henrik Ziegeldorf ... Jan Rüth
22 Mar 2017
22 Mar 2017

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

ZipHMMlib: a highly optimised HMM library exploiting repetitions in the input to speed up the forward algorithm

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: BMC Bioinformatics