Quantifying the displacement of mismatches in multiple sequence alignment benchmarks.

Punto Bawono,Sanne Abeln,Arjan Van Der Velde,Jaap Heringa

doi:10.1371/journal.pone.0127431

Punto Bawono, Sanne Abeln + Show 2 more

Open Access

https://doi.org/10.1371/journal.pone.0127431

Copy DOI

Journal: PloS one	Publication Date: May 19, 2015
Citations: 7	License type: CC BY 4.0

Affiliation: Academic Center for Dentistry Amsterdam

Abstract

Multiple Sequence Alignment (MSA) methods are typically benchmarked on sets of reference alignments. The quality of the alignment can then be represented by the sum-of-pairs (SP) or column (CS) scores, which measure the agreement between a reference and corresponding query alignment. Both the SP and CS scores treat mismatches between a query and reference alignment as equally bad, and do not take the separation into account between two amino acids in the query alignment, that should have been matched according to the reference alignment. This is significant since the magnitude of alignment shifts is often of relevance in biological analyses, including homology modeling and MSA refinement/manual alignment editing. In this study we develop a new alignment benchmark scoring scheme, SPdist, that takes the degree of discordance of mismatches into account by measuring the sequence distance between mismatched residue pairs in the query alignment. Using this new score along with the standard SP score, we investigate the discriminatory behavior of the new score by assessing how well six different MSA methods perform with respect to BAliBASE reference alignments. The SP score and the SPdist score yield very similar outcomes when the reference and query alignments are close. However, for more divergent reference alignments the SPdist score is able to distinguish between methods that keep alignments approximately close to the reference and those exhibiting larger shifts. We observed that by using SPdist together with SP scoring we were able to better delineate the alignment quality difference between alternative MSA methods. With a case study we exemplify why it is important, from a biological perspective, to consider the separation of mismatches. The SPdist scoring scheme has been implemented in the VerAlign web server (http://www.ibi.vu.nl/programs/veralignwww/). The code for calculating SPdist score is also available upon request.

Highlights

Since its inception in the 1970s—early 1980s [1, 2], Multiple Sequence Alignment (MSA) has been one of the most prominent computational techniques in modern molecular biology, and its significance and popularity has only increased since the advent of next-generation sequencing (NGS) techniques
In order to understand the effect of the severity of mismatch shifts on the alignment score, we benchmarked six different MSA methods against BAliBASE reference alignments using the standard SP score, shift score and the new SPdist score
We argue that the severity of alignment shifts is often of relevance in numerous biological analyses, e.g. in homology modeling and manual alignment editing

Summary

Introduction

Since its inception in the 1970s—early 1980s [1, 2], Multiple Sequence Alignment (MSA) has been one of the most prominent computational techniques in modern molecular biology, and its significance and popularity has only increased since the advent of next-generation sequencing (NGS) techniques. One is a reference-independent alignment score, that can be calculated solely from sequence information of a single MSA, which is very useful in case curated or structural information on proteins of interest is unavailable. An example of such a score is the sumof-pairs scoring scheme (which we will refer to as standalone_SP score)[11]. This score is calculated for a stand-alone MSA by exploiting an evolutionary model from which probabilities of pairwise residue conservations and mutations are derived [12, 13].

Objectives

Methods

Results

Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Quantifying the displacement of mismatches in multiple sequence alignment benchmarks.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: PloS one

Lead the way for us

Similar Papers

A Hybrid method for effective multiple sequence alignment
A Layeb ... S Meshoul
-
A Layeb, et. al.A Layeb ... S Meshoul
01 Jul 2009
01 Jul 2009

A data-centric pipeline using convolutional neural network to select better multiple sequence alignment method
Mengmeng Kuang ... Hing-Fung Ting
-
Mengmeng Kuang, et. al.Mengmeng Kuang ... Hing-Fung Ting
21 Sep 2020
21 Sep 2020

Large-Scale Genome Comparison Based on Cumulative Fourier Power and Phase Spectra: Central Moment and Covariance Vector
Shaojun Pei ... Stephen S.-T Yau
Computational and Structural Biotechnology Journal | VOL. 17
Shaojun Pei, et. al.Shaojun Pei ... Stephen S.-T Yau
01 Jan 2019
Computational and Structural Biotechnology Journal | VOL. 17

Protein multiple sequence alignment by basic flower pollination algorithm
Ahmad Mohdaziz Hussein ... Aziz Nasser Boraik Ali
-
Ahmad Mohdaziz Hussein, et. al.Ahmad Mohdaziz Hussein ... Aziz Nasser Boraik Ali
01 May 2017
01 May 2017

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Quantifying the displacement of mismatches in multiple sequence alignment benchmarks.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: PloS one