How sequence alignment scores correspond to probability models.

Martin C Frith,Alfonso Valencia

doi:10.1093/bioinformatics/btz576

Martin C Frith, Alfonso Valencia

Open Access

https://doi.org/10.1093/bioinformatics/btz576

Copy DOI

Journal: Bioinformatics	Publication Date: Jul 22, 2019
Citations: 13	License type: CC BY 4.0

Affiliation: The University of Tokyo, Waseda University

Abstract

Sequence alignment remains fundamental in bioinformatics. Pair-wise alignment is traditionally based on ad hoc scores for substitutions, insertions and deletions, but can also be based on probability models (pair hidden Markov models: PHMMs). PHMMs enable us to: fit the parameters to each kind of data, calculate the reliability of alignment parts and measure sequence similarity integrated over possible alignments. This study shows how multiple models correspond to one set of scores. Scores can be converted to probabilities by partition functions with a 'temperature' parameter: for any temperature, this corresponds to some PHMM. There is a special class of models with balanced length probability, i.e. no bias toward either longer or shorter alignments. The best way to score alignments and assess their significance depends on the aim: judging whether whole sequences are related versus finding related parts. This clarifies the statistical basis of sequence alignment. Supplementary data are available at Bioinformatics online.

Highlights

The main way of analyzing nucleotide and protein sequences is by comparing them to related sequences
This study describes the equivalence between the partition function approach and pair hidden Markov model (PHMM)
This study describes the many-to-one relationship between probability models and score parameters for sequence alignment

Summary

Introduction

The main way of analyzing nucleotide and protein sequences is by comparing them to related sequences This is usually done by defining scores for aligned monomers, insertions, and deletions, finding alignments with maximal total score. Alignment models typically omit rapid evolution of tandem repeats, neighbordependence of substitutions, etc., but have proven useful. Another approach is to define alignment probabilities as exponentiated scores [27, 18]: prob(alignment) ∝ exp(alignment score/t),. This study clarifies the notion of alignment models with balanced length probability, i.e. no bias towards either longer or shorter alignments It concludes with a discussion of the best way to score alignments and assess their significance, depending on our precise aim. One previous study describes a one-to-many relationship between alignment scores and probabilities [1], but lacks most of the results presented here

Review of score-based alignment

Degrees of freedom

Algorithms for local alignment

Review of alignment probability models

Degrees of freedom in the gapless model

Homogeneous letter probabilities

Uniform length probability

Examples

Linear gap costs

Balanced length probability

Affine gap costs

Limits to degrees of freedom

Non-uniqueness of t

Sum of alignment probabilities

Discussion

Useful probability calculations

Sequences with multiple similar segments

Alignment significance

Aims of sequence comparison

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

How sequence alignment scores correspond to probability models.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Bioinformatics

Lead the way for us

Similar Papers

Sequence Alignments and Pair Hidden Markov Models Using Evolutionary History
Bjarne Knudsen ... Michael M Miyamoto
Journal of Molecular Biology | VOL. 333
Bjarne Knudsen, et. al.Bjarne Knudsen ... Michael M Miyamoto
28 Sep 2003
Journal of Molecular Biology | VOL. 333

Improving pairwise sequence alignment accuracy using near-optimal protein sequence alignments
Michael L Sierk ... William R Pearson
BMC Bioinformatics | VOL. 11
Michael L Sierk, et. al.Michael L Sierk ... William R Pearson
22 Mar 2010
BMC Bioinformatics | VOL. 11

Probabilistic approaches to alignment with tandem repeats
Michal Nánási ... Broňa Brejová
Algorithms for Molecular Biology | VOL. 9
Michal Nánási, et. al.Michal Nánási ... Broňa Brejová
01 Jan 2014
Algorithms for Molecular Biology | VOL. 9

Computing word similarity and identifying cognates with pair hidden Markov models
Wesley Mackay ... Grzegorz Kondrak
-
Wesley Mackay, et. al.Wesley Mackay ... Grzegorz Kondrak
01 Jan 2004
01 Jan 2004

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

How sequence alignment scores correspond to probability models.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Bioinformatics