Abstract

Multiple Sequence Alignment (MSA) methods are typically benchmarked on sets of reference alignments. The quality of the alignment can then be represented by the sum-of-pairs (SP) or column (CS) scores, which measure the agreement between a reference and corresponding query alignment. Both the SP and CS scores treat mismatches between a query and reference alignment as equally bad, and do not take the separation into account between two amino acids in the query alignment, that should have been matched according to the reference alignment. This is significant since the magnitude of alignment shifts is often of relevance in biological analyses, including homology modeling and MSA refinement/manual alignment editing. In this study we develop a new alignment benchmark scoring scheme, SPdist, that takes the degree of discordance of mismatches into account by measuring the sequence distance between mismatched residue pairs in the query alignment. Using this new score along with the standard SP score, we investigate the discriminatory behavior of the new score by assessing how well six different MSA methods perform with respect to BAliBASE reference alignments. The SP score and the SPdist score yield very similar outcomes when the reference and query alignments are close. However, for more divergent reference alignments the SPdist score is able to distinguish between methods that keep alignments approximately close to the reference and those exhibiting larger shifts. We observed that by using SPdist together with SP scoring we were able to better delineate the alignment quality difference between alternative MSA methods. With a case study we exemplify why it is important, from a biological perspective, to consider the separation of mismatches. The SPdist scoring scheme has been implemented in the VerAlign web server (http://www.ibi.vu.nl/programs/veralignwww/). The code for calculating SPdist score is also available upon request.

Highlights

  • Since its inception in the 1970s—early 1980s [1, 2], Multiple Sequence Alignment (MSA) has been one of the most prominent computational techniques in modern molecular biology, and its significance and popularity has only increased since the advent of next-generation sequencing (NGS) techniques

  • In order to understand the effect of the severity of mismatch shifts on the alignment score, we benchmarked six different MSA methods against BAliBASE reference alignments using the standard SP score, shift score and the new SPdist score

  • We argue that the severity of alignment shifts is often of relevance in numerous biological analyses, e.g. in homology modeling and manual alignment editing

Read more

Summary

Introduction

Since its inception in the 1970s—early 1980s [1, 2], Multiple Sequence Alignment (MSA) has been one of the most prominent computational techniques in modern molecular biology, and its significance and popularity has only increased since the advent of next-generation sequencing (NGS) techniques. One is a reference-independent alignment score, that can be calculated solely from sequence information of a single MSA, which is very useful in case curated or structural information on proteins of interest is unavailable. An example of such a score is the sumof-pairs scoring scheme (which we will refer to as standalone_SP score)[11]. This score is calculated for a stand-alone MSA by exploiting an evolutionary model from which probabilities of pairwise residue conservations and mutations are derived [12, 13].

Objectives
Methods
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call