Current Methods for Automated Filtering of Multiple Sequence Alignments Frequently Worsen Single-Gene Phylogenetic Inference.

Ge Tan,Christophe Dessimoz,Javier Herrero,Nick Goldman,Matthieu Muffato,Christian Ledergerber,Manuel Gil

doi:10.1093/sysbio/syv033

Abstract

Phylogenetic inference is generally performed on the basis of multiple sequence alignments (MSA). Because errors in an alignment can lead to errors in tree estimation, there is a strong interest in identifying and removing unreliable parts of the alignment. In recent years several automated filtering approaches have been proposed, but despite their popularity, a systematic and comprehensive comparison of different alignment filtering methods on real data has been lacking. Here, we extend and apply recently introduced phylogenetic tests of alignment accuracy on a large number of gene families and contrast the performance of unfiltered versus filtered alignments in the context of single-gene phylogeny reconstruction. Based on multiple genome-wide empirical and simulated data sets, we show that the trees obtained from filtered MSAs are on average worse than those obtained from unfiltered MSAs. Furthermore, alignment filtering often leads to an increase in the proportion of well-supported branches that are actually wrong. We confirm that our findings hold for a wide range of parameters and methods. Although our results suggest that light filtering (up to 20% of alignment positions) has little impact on tree accuracy and may save some computation time, contrary to widespread practice, we do not generally recommend the use of current alignment filtering methods for phylogenetic inference. By providing a way to rigorously and systematically measure the impact of filtering on alignments, the methodology set forth here will guide the development of better filtering algorithms.

Highlights

Phylogenetic reconstruction pervades computational and evolutionary biology; it is important to be able to compute accurate phylogenetic trees
Our results suggest that light filtering has little impact on tree accuracy and may save some computation time, contrary to widespread practice, we do not generally recommend the use of current alignment filtering methods for phylogenetic inference
Sequences were aligned by Prank and phylogenetic trees were inferred by maximum likelihood

Summary

Introduction

Phylogenetic reconstruction pervades computational and evolutionary biology; it is important to be able to compute accurate phylogenetic trees. Given a set of sequences, an ideal MSA identifies homologous characters, that is, characters having common ancestry. Computing such an MSA can be challenging. While most alignment programs will correctly identify and align highly conserved regions, regions containing a large number of insertions and/or deletions are typically less reliable. Such unreliable sections and erroneously aligned residues can negatively affect downstream analyses, such as tree inference (Lunter et al 2008; Wong et al 2008; Dessimoz and Gil 2010)

Methods

Results

Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Systematic Biology	Publication Date: Jun 1, 2015
Citations: 256	License type: cc-by

R Discovery Prime

R Discovery Prime

Current Methods for Automated Filtering of Multiple Sequence Alignments Frequently Worsen Single-Gene Phylogenetic Inference.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Systematic Biology

Lead the way for us

Similar Papers

Constructing genetic exchange communities among bacteria and archaea
Yingnan Cong
-
Yingnan CongYingnan Cong
21 Oct 2016
21 Oct 2016

Benchmark datasets and software for developing and testing methods for large-scale multiple sequence alignment and phylogenetic inference.
C Randal Linder ... Rahul Suri
PLoS currents | VOL. 2
C Randal Linder, et. al.C Randal Linder ... Rahul Suri
18 Nov 2010
PLoS currents | VOL. 2

Phylogenetic Tree Estimation With and Without Alignment: New Distance Methods and Benchmarking.
Marcin Bogusz ... Simon Whelan
Systematic biology | VOL. 66
Marcin Bogusz, et. al.Marcin Bogusz ... Simon Whelan
14 Sep 2016
Systematic biology | VOL. 66

Class of Multiple Sequence Alignment Algorithm Affects Genomic Analysis
B P Blackburne ... S Whelan
Molecular Biology and Evolution | VOL. 30
B P Blackburne, et. al.B P Blackburne ... S Whelan
09 Nov 2012
Molecular Biology and Evolution | VOL. 30

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Current Methods for Automated Filtering of Multiple Sequence Alignments Frequently Worsen Single-Gene Phylogenetic Inference.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Systematic Biology