Fast Statistical Alignment

Robert K Bradley,Sudeep Juvekar,Jaeyoung Do,Colin Dewey,Adam Roberts,Ian Holmes,Michael Smoot,Lior Pachter

doi:10.1371/journal.pcbi.1000392

Abstract

We describe a new program for the alignment of multiple biological sequences that is both statistically motivated and fast enough for problem sizes that arise in practice. Our Fast Statistical Alignment program is based on pair hidden Markov models which approximate an insertion/deletion process on a tree and uses a sequence annealing algorithm to combine the posterior probabilities estimated from these models into a multiple alignment. FSA uses its explicit statistical model to produce multiple alignments which are accompanied by estimates of the alignment accuracy and uncertainty for every column and character of the alignment—previously available only with alignment programs which use computationally-expensive Markov Chain Monte Carlo approaches—yet can align thousands of long sequences. Moreover, FSA utilizes an unsupervised query-specific learning procedure for parameter estimation which leads to improved accuracy on benchmark reference alignments in comparison to existing programs. The centroid alignment approach taken by FSA, in combination with its learning procedure, drastically reduces the amount of false-positive alignment on biological data in comparison to that given by other methods. The FSA program and a companion visualization tool for exploring uncertainty in alignments can be used via a web interface at http://orangutan.math.berkeley.edu/fsa/, and the source code is available at http://fsa.sourceforge.net/.

Highlights

The field of biological sequence alignment is very active, with numerous new alignment programs developed every year in response to increasing demand driven by rapidly-dropping sequencing costs
Each type of benchmark is vulnerable to manipulation and may not represent the problem setups which are most relevant to biologists
The result is that biologists are confronted with many programs and publications, but it is frequently unclear which approach will give the best results for the everyday problems which they seek to address

Summary

Introduction

The field of biological sequence alignment is very active, with numerous new alignment programs developed every year in response to increasing demand driven by rapidly-dropping sequencing costs. The ClustalW program [1,2], published in 1994, remains the most widely-used multiple sequence alignment program. In a recent review of multiple sequence alignment [3], the authors remark that ‘‘to the best of our knowledge, no significant improvements have been made to the [ClustalW] algorithm since 1994 and several modern methods achieve better performance in accuracy, speed, or both.’’ it is natural to ask, ‘‘Why do alignment programs continue to be developed, and why are new tools not more widely adopted by biologists?’’. The result is that biologists are confronted with many programs and publications, but it is frequently unclear which approach will give the best results for the everyday problems which they seek to address

Objectives

Methods

Results

Discussion

Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: PLoS Computational Biology	Publication Date: May 29, 2009
Citations: 396	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

Fast Statistical Alignment

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: PLoS Computational Biology

Lead the way for us

Similar Papers

PicXAA-Web: a web-based platform for non-progressive maximum expected accuracy alignment of multiple biological sequences
S M E Sahraeian ... B.-J Yoon
Nucleic Acids Research | VOL. 39
S M E Sahraeian, et. al.S M E Sahraeian ... B.-J Yoon
22 Apr 2011
Nucleic Acids Research | VOL. 39

A domain decomposition strategy for alignment of multiple biological sequences on multiprocessor platforms
Fahad Saeed ... Ashfaq Khokhar
Journal of Parallel and Distributed Computing | VOL. 69
Fahad Saeed, et. al.Fahad Saeed ... Ashfaq Khokhar
05 Apr 2009
Journal of Parallel and Distributed Computing | VOL. 69

Experimental study & analysis of genetic operators for alignment of multiple biological sequences
Preeti Gupta ... Pankaj Agarwal
-
Preeti Gupta, et. al.Preeti Gupta ... Pankaj Agarwal
01 Mar 2015
01 Mar 2015

A Parallel Algorithm for Multiple Biological Sequence Alignment
Irma R Andalon-Garcia ... M E Meda-Campaña
-
Irma R Andalon-Garcia, et. al.Irma R Andalon-Garcia ... M E Meda-Campaña
01 Jan 2012
01 Jan 2012

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Fast Statistical Alignment

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: PLoS Computational Biology