A benchmark study of sequence alignment methods for protein clustering

Yingying Wang,Hongyan Wu,Yunpeng Cai

doi:10.1186/s12859-018-2524-4

Yingying Wang, Hongyan Wu + Show 1 more

Open Access

https://doi.org/10.1186/s12859-018-2524-4

Copy DOI

Journal: BMC Bioinformatics	Publication Date: Dec 1, 2018
Citations: 26	License type: open-access

Affiliation: Shenzhen Institutes of Advanced Technology

Abstract

BackgroundProtein sequence alignment analyses have become a crucial step for many bioinformatics studies during the past decades. Multiple sequence alignment (MSA) and pair-wise sequence alignment (PSA) are two major approaches in sequence alignment. Former benchmark studies revealed drawbacks of MSA methods on nucleotide sequence alignments. To test whether similar drawbacks also influence protein sequence alignment analyses, we propose a new benchmark framework for protein clustering based on cluster validity. This new framework directly reflects the biological ground truth of the application scenarios that adopt sequence alignments, and evaluates the alignment quality according to the achievement of the biological goal, rather than the comparison on sequence level only, which averts the biases introduced by alignment scores or manual alignment templates. Compared with former studies, we calculate the cluster validity score based on sequence distances instead of clustering results. This strategy could avoid the influence brought by different clustering methods thus make results more dependable.ResultsResults showed that PSA methods performed better than MSA methods on most of the BAliBASE benchmark datasets. Analyses on the 80 re-sampled benchmark datasets constructed by randomly choosing 90% of each dataset 10 times showed similar results.ConclusionsThese results validated that the drawbacks of MSA methods revealed in nucleotide level also existed in protein sequence alignment analyses and affect the accuracy of results.

Highlights

Protein sequence alignment analyses have become a crucial step for many bioinformatics studies during the past decades
Results showed that (1) Esprit got the highest scores on all the datasets based on Silhouette Width (SW) calculation; (2) both Esprit and MUSCLE got high scores based on RS calculation, Esprit performed a little better than MUSCLE in total
Results showed that based on SW scores, Esprit performed better than other Multiple sequence alignment (MSA) methods used in this study in both RV11 and RV12 with SW scores 0.008933 and 0.107577, separately (See Fig. 2(a) for details)

Summary

Introduction

Protein sequence alignment analyses have become a crucial step for many bioinformatics studies during the past decades. To test whether similar drawbacks influence protein sequence alignment analyses, we propose a new benchmark framework for protein clustering based on cluster validity. This new framework directly reflects the biological ground truth of the application scenarios that adopt sequence alignments, and evaluates the alignment quality according to the achievement of the biological goal, rather than the comparison on sequence level only, which averts the biases introduced by alignment scores or manual alignment templates. Protein sequence alignments analyses become a crucial step for many bioinformatics analysis studies during the past decades.

Methods

Results

Discussion

Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

A benchmark study of sequence alignment methods for protein clustering

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: BMC Bioinformatics

Lead the way for us

Similar Papers

MSACompro: protein multiple sequence alignment using predicted secondary structure, solvent accessibility, and residue-residue contacts
Xin Deng ... Jianlin Cheng
BMC Bioinformatics | VOL. 12
Xin Deng, et. al.Xin Deng ... Jianlin Cheng
01 Dec 2011
BMC Bioinformatics | VOL. 12

PhyPA: Phylogenetic method with pairwise sequence alignment outperforms likelihood methods in phylogenetics involving highly diverged sequences
Xuhua Xia
Molecular Phylogenetics and Evolution | VOL. 102
Xuhua XiaXuhua Xia
01 Jul 2016
Molecular Phylogenetics and Evolution | VOL. 102

MISHIMA - a new method for high speed multiple alignment of nucleotide sequences of bacterial genome scale data
Kirill Kryukov ... Naruya Saitou
BMC Bioinformatics | VOL. 11
Kirill Kryukov, et. al.Kirill Kryukov ... Naruya Saitou
18 Mar 2010
BMC Bioinformatics | VOL. 11

Multiple Sequence Alignment by Conformational Space Annealing
Keehyoung Joo ... Jooyoung Lee
Biophysical Journal | VOL. 95
Keehyoung Joo, et. al.Keehyoung Joo ... Jooyoung Lee
01 Nov 2008
Biophysical Journal | VOL. 95

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

A benchmark study of sequence alignment methods for protein clustering

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: BMC Bioinformatics