Accelerated Profile HMM Searches

Sean R Eddy

doi:10.1371/journal.pcbi.1002195

Abstract

Profile hidden Markov models (profile HMMs) and probabilistic inference methods have made important contributions to the theory of sequence database homology search. However, practical use of profile HMM methods has been hindered by the computational expense of existing software implementations. Here I describe an acceleration heuristic for profile HMMs, the “multiple segment Viterbi” (MSV) algorithm. The MSV algorithm computes an optimal sum of multiple ungapped local alignment segments using a striped vector-parallel approach previously described for fast Smith/Waterman alignment. MSV scores follow the same statistical distribution as gapped optimal local alignment scores, allowing rapid evaluation of significance of an MSV score and thus facilitating its use as a heuristic filter. I also describe a 20-fold acceleration of the standard profile HMM Forward/Backward algorithms using a method I call “sparse rescaling”. These methods are assembled in a pipeline in which high-scoring MSV hits are passed on for reanalysis with the full HMM Forward/Backward algorithm. This accelerated pipeline is implemented in the freely available HMMER3 software package. Performance benchmarks show that the use of the heuristic MSV filter sacrifices negligible sensitivity compared to unaccelerated profile HMM searches. HMMER3 is substantially more sensitive and 100- to 1000-fold faster than HMMER2. HMMER3 is now about as fast as BLAST for protein searches.

Highlights

Sequence database homology searching is one of the most important applications in computational molecular biology
The panel shows results for 76 query profiles, chosen to sample the full range of query lengths in the Pfam protein domain database from 7 to 2,217 residues. These results show that HMMER3 performance is comparable to other fast database search programs; somewhat slower than NCBI BLAST, and somewhat faster than WUBLAST, for example
In describing the multiple segment Viterbi’’ (MSV) heuristic and other acceleration methods implemented in HMMER3, I have not addressed the question of whether the MSV heuristic is better or worse than other heuristics, such as those in BLAST or FASTA

Summary

Introduction

Sequence database homology searching is one of the most important applications in computational molecular biology. The most widely used tool for sequence comparison and database search is BLAST [1,2,3]. Since BLAST’s introduction, some important advances have been made in the theory of sequence comparison, by using probabilistic inference methods based on profile hidden Markov models (profile HMMs) [4]. Though, the BLAST implementation computes optimal local alignment scores using ad hoc gap penalties. This implementation core may not be readily adaptable to a probabilistic insertion/deletion model and the more powerful ‘‘Forward/Backward’’ HMM algorithm that computes not just one best-scoring alignment, but a sum of probabilities over the entire local alignment ensemble. The Forward algorithm allows a more powerful and formal log-likelihood score statistic to be assigned to each target sequence, and Forward/Backward allows confidence values to be assigned to each aligned residue

Methods

Results

Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: PLoS Computational Biology	Publication Date: Oct 20, 2011
Citations: 5363	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

Accelerated Profile HMM Searches

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: PLoS Computational Biology

Lead the way for us

Similar Papers

Profile hidden Markov models.
S R Eddy
Bioinformatics | VOL. 14
S R EddyS R Eddy
01 Jan 1998
Bioinformatics | VOL. 14

Multiple word alignment with profile hidden Markov models
Aditya Bhargava ... Grzegorz Kondrak
-
Aditya Bhargava, et. al.Aditya Bhargava ... Grzegorz Kondrak
01 Jan 2009
01 Jan 2009

A two-layered machine learning method to identify protein O-GlcNAcylation sites with O-GlcNAc transferase substrate motifs.
Hui-Ju Kao ... Neil Arvin Bretaña
BMC Bioinformatics | VOL. Suppl 16 18
Hui-Ju Kao, et. al.Hui-Ju Kao ... Neil Arvin Bretaña
01 Dec 2015
BMC Bioinformatics | VOL. Suppl 16 18

Designing Patterns and Profiles for Faster HMM Search
Yanni Sun ... J Buhler
IEEE/ACM Transactions on Computational Biology and Bioinformatics | VOL. 6
Yanni Sun, et. al. Yanni Sun ... J Buhler
01 Apr 2009
IEEE/ACM Transactions on Computational Biology and Bioinformatics | VOL. 6

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Accelerated Profile HMM Searches

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: PLoS Computational Biology