Abstract

BackgroundAlignment of amino acid sequences by means of dynamic programming is a cornerstone sequence comparison method. The quality of alignments produced by dynamic programming critically depends on the choice of the alignment scoring function. Therefore, for a specific alignment problem one needs a way of selecting the best performing scoring function. This work is focused on the issue of finding optimized protein family- and fold-specific scoring functions for global similarity matrix-based sequence alignment.FindingsI utilize a comprehensive set of reference alignments obtained from structural superposition of homologous and analogous proteins to design a quantitative statistical framework for evaluating the performance of alignment scoring functions in global pairwise sequence alignment. This framework is applied to study how existing general-purpose amino acid similarity matrices perform on individual protein families and structural folds, and to compare them to family-specific and fold-specific matrices derived in this work. I describe an adaptive alignment procedure that automatically selects an appropriate similarity matrix and optimized gap penalties based on the properties of the sequences being aligned.ConclusionsThe results of this work indicate that using family-specific similarity matrices significantly improves the quality of the alignment of homologous sequences over the traditional sequence alignment based on a single general-purpose similarity matrix. However, using fold-specific similarity matrices can only marginally improve sequence alignment of proteins that share the same structural fold but do not share a common evolutionary origin. The family-specific matrices derived in this work and the optimized gap penalties are available at http://taurus.crc.albany.edu/fsm.

Highlights

  • Alignment of amino acid sequences by means of dynamic programming is a cornerstone sequence comparison method

  • The outcome of a dynamic programming procedure applied to align amino acid sequences critically depends on the alignment scoring function used by this procedure [7,8]

  • The results shown are for groups with 10 or more reference alignments (244 groups in the SUP sub-set, 131 groups in the Twilight Zone” [37] sub-set (TWI) sub-set)

Read more

Summary

Introduction

Alignment of amino acid sequences by means of dynamic programming is a cornerstone sequence comparison method. The quality of alignments produced by dynamic programming critically depends on the choice of the alignment scoring function. This work is focused on the issue of finding optimized protein family- and fold-specific scoring functions for global similarity matrix-based sequence alignment. Pairwise alignment of amino acid sequences is a cornerstone sequence comparison method used in a variety of computational applications [1,2,3,4]. The outcome of a dynamic programming procedure applied to align amino acid sequences critically depends on the alignment scoring function used by this procedure [7,8]. Improving the quality of substitution matrix-based global pairwise alignments is an important step in improving other more complex computational applications

Objectives
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call