Protein sequence alignment with family-specific amino acid similarity matrices

Igor B Kuznetsov

doi:10.1186/1756-0500-4-296

Abstract

BackgroundAlignment of amino acid sequences by means of dynamic programming is a cornerstone sequence comparison method. The quality of alignments produced by dynamic programming critically depends on the choice of the alignment scoring function. Therefore, for a specific alignment problem one needs a way of selecting the best performing scoring function. This work is focused on the issue of finding optimized protein family- and fold-specific scoring functions for global similarity matrix-based sequence alignment.FindingsI utilize a comprehensive set of reference alignments obtained from structural superposition of homologous and analogous proteins to design a quantitative statistical framework for evaluating the performance of alignment scoring functions in global pairwise sequence alignment. This framework is applied to study how existing general-purpose amino acid similarity matrices perform on individual protein families and structural folds, and to compare them to family-specific and fold-specific matrices derived in this work. I describe an adaptive alignment procedure that automatically selects an appropriate similarity matrix and optimized gap penalties based on the properties of the sequences being aligned.ConclusionsThe results of this work indicate that using family-specific similarity matrices significantly improves the quality of the alignment of homologous sequences over the traditional sequence alignment based on a single general-purpose similarity matrix. However, using fold-specific similarity matrices can only marginally improve sequence alignment of proteins that share the same structural fold but do not share a common evolutionary origin. The family-specific matrices derived in this work and the optimized gap penalties are available at http://taurus.crc.albany.edu/fsm.

Highlights

Alignment of amino acid sequences by means of dynamic programming is a cornerstone sequence comparison method
The outcome of a dynamic programming procedure applied to align amino acid sequences critically depends on the alignment scoring function used by this procedure [7,8]
The results shown are for groups with 10 or more reference alignments (244 groups in the SUP sub-set, 131 groups in the Twilight Zone” [37] sub-set (TWI) sub-set)

Summary

Introduction

Alignment of amino acid sequences by means of dynamic programming is a cornerstone sequence comparison method. The quality of alignments produced by dynamic programming critically depends on the choice of the alignment scoring function. This work is focused on the issue of finding optimized protein family- and fold-specific scoring functions for global similarity matrix-based sequence alignment. Pairwise alignment of amino acid sequences is a cornerstone sequence comparison method used in a variety of computational applications [1,2,3,4]. The outcome of a dynamic programming procedure applied to align amino acid sequences critically depends on the alignment scoring function used by this procedure [7,8]. Improving the quality of substitution matrix-based global pairwise alignments is an important step in improving other more complex computational applications

Objectives

Results

Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: BMC Research Notes	Publication Date: Aug 16, 2011
Citations: 49	License type: CC BY 2.0

R Discovery Prime

R Discovery Prime

Protein sequence alignment with family-specific amino acid similarity matrices

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: BMC Research Notes

Lead the way for us

Similar Papers

Grouping Users Through Pair Wise Sequence Alignment and Graph Traversal Based on Web Page Navigation Behaviour
R Geetharamani ... P Revathy
-
R Geetharamani, et. al.R Geetharamani ... P Revathy
01 Jan 2020
01 Jan 2020

Global pairwise sequence alignment using Hidden Markov Models applied through different scoring schemes
M Duran ... I O Bucak
-
M Duran, et. al.M Duran ... I O Bucak
01 Jan 2012
01 Jan 2012

MPSAGA: a matrix-based pair-wise sequence alignment algorithm for global alignment with position based sequence representation
Jyoti Lakhani ... Anupama Choudhary
Sādhanā | VOL. 44
Jyoti Lakhani, et. al.Jyoti Lakhani ... Anupama Choudhary
29 Jun 2019
Sādhanā | VOL. 44

Text-based similarity searching for hit- and lead-candidate identification
Volker Hähnke
Journal of Cheminformatics | VOL. 4
Volker HähnkeVolker Hähnke
01 May 2012
Journal of Cheminformatics | VOL. 4

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Protein sequence alignment with family-specific amino acid similarity matrices

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: BMC Research Notes