Logarithmic gap costs decrease alignment accuracy.

Reed A Cartwright

doi:10.1186/1471-2105-7-527

Abstract

BackgroundStudies on the distribution of indel sizes have consistently found that they obey a power law. This finding has lead several scientists to propose that logarithmic gap costs, G (k) = a + c ln k, are more biologically realistic than affine gap costs, G (k) = a + bk, for sequence alignment. Since quick and efficient affine costs are currently the most popular way to globally align sequences, the goal of this paper is to determine whether logarithmic gap costs improve alignment accuracy significantly enough the merit their use over the faster affine gap costs.ResultsA group of simulated sequences pairs were globally aligned using affine, logarithmic, and log-affine gap costs. Alignment accuracy was calculated by comparing resulting alignments to actual alignments of the sequence pairs. Gap costs were then compared based on average alignment accuracy. Log-affine gap costs had the best accuracy, followed closely by affine gap costs, while logarithmic gap costs performed poorly. Subsequently a model was developed to explain the results.ConclusionIn contrast to initial expectations, logarithmic gap costs produce poor alignments and are actually not implied by the power-law behavior of gap sizes, given typical match and mismatch costs. Furthermore, affine gap costs not only produce accurate alignments but are also good approximations to biologically realistic gap costs. This work provides added confidence for the biological relevance of existing alignment algorithms.

Highlights

Studies on the distribution of indel sizes have consistently found that they obey a power law
Sequence alignments are essential to the study of molecular biology and systematics because they purport to reveal regions in sequences that are homologous
How do the best gap costs for each scheme compare to one another? And second, how do the maximum alignment accuracy for each scheme compare to one another for each sequence pair? The first question investigates what happens if researchers use a single gap cost across many alignments, and the second investigates what happens if researchers optimize gap costs to each alignment

Summary

Introduction

Studies on the distribution of indel sizes have consistently found that they obey a power law. This finding has lead several scientists to propose that logarithmic gap costs, G (k) = a + c ln k, are more biologically realistic than affine gap costs, G (k) = a + bk, for sequence alignment. Researchers usually need to align sequences before they can be studied. There are two main types of alignment algorithms: local and global. Local alignment algorithms like FASTA [3] and BLAST [4] attempt to align only parts of sequences often avoiding gaps, whereas global alignment algorithms like CLUSTAL [5,6] and MCALIGN [7,8] attempt to align entire sequences, explicitly handling gaps.

Objectives

Methods

Results

Discussion

Conclusion

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: BMC Bioinformatics	Publication Date: Dec 1, 2006
Citations: 28	License type: CC BY 2.0

R Discovery Prime

R Discovery Prime

Logarithmic gap costs decrease alignment accuracy.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: BMC Bioinformatics

Lead the way for us

Similar Papers

Bi-alignments with affine gaps costs
Peter F Stadler ... Sebastian Will
Algorithms for Molecular Biology | VOL. 17
Peter F Stadler, et. al.Peter F Stadler ... Sebastian Will
16 May 2022
Algorithms for Molecular Biology | VOL. 17

Optimal Multiple Parsimony Alignment with Affine Gap Cost Using a Phylogenetic Tree
Bjarne Knudsen
-
Bjarne KnudsenBjarne Knudsen
01 Jan 2003
01 Jan 2003

Optimal sequence alignment using affine gap costs
Stephen F Altschul ... Bruce W Erickson
Bulletin of Mathematical Biology | VOL. 48
Stephen F Altschul, et. al.Stephen F Altschul ... Bruce W Erickson
01 Jan 1986
Bulletin of Mathematical Biology | VOL. 48

Generalized affine gap costs for protein sequence alignment
Stephen F Altschul
Proteins: Structure, Function, and Genetics | VOL. 32
Stephen F AltschulStephen F Altschul
01 Jul 1998
Proteins: Structure, Function, and Genetics | VOL. 32

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Logarithmic gap costs decrease alignment accuracy.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: BMC Bioinformatics