Selecting optimal partitioning schemes for phylogenomic datasets.

Robert Lanfear,Alexandros Stamatakis,David Kainer,Christoph Mayer,Brett Calcott

doi:10.1186/1471-2148-14-82

Abstract

BackgroundPartitioning involves estimating independent models of molecular evolution for different subsets of sites in a sequence alignment, and has been shown to improve phylogenetic inference. Current methods for estimating best-fit partitioning schemes, however, are only computationally feasible with datasets of fewer than 100 loci. This is a problem because datasets with thousands of loci are increasingly common in phylogenetics.MethodsWe develop two novel methods for estimating best-fit partitioning schemes on large phylogenomic datasets: strict and relaxed hierarchical clustering. These methods use information from the underlying data to cluster together similar subsets of sites in an alignment, and build on clustering approaches that have been proposed elsewhere.ResultsWe compare the performance of our methods to each other, and to existing methods for selecting partitioning schemes. We demonstrate that while strict hierarchical clustering has the best computational efficiency on very large datasets, relaxed hierarchical clustering provides scalable efficiency and returns dramatically better partitioning schemes as assessed by common criteria such as AICc and BIC scores.ConclusionsThese two methods provide the best current approaches to inferring partitioning schemes for very large datasets. We provide free open-source implementations of the methods in the PartitionFinder software. We hope that the use of these methods will help to improve the inferences made from large phylogenomic datasets.

Highlights

Partitioning involves estimating independent models of molecular evolution for different subsets of sites in a sequence alignment, and has been shown to improve phylogenetic inference
All three algorithms we discuss in this paper start with a user-defined set of data blocks, and progressively merge data blocks to improve the information-theoretic score of the partitioning scheme
We discuss algorithm performance below in two ways: in terms of the amount that they improve the score of the partitioning scheme relative to the starting scheme which has each data block assigned to an independent subset; and in terms of the percentage improvement that an algorithm achieves relative to the existing greedy algorithm in PartitionFinder

Summary

Introduction

Partitioning involves estimating independent models of molecular evolution for different subsets of sites in a sequence alignment, and has been shown to improve phylogenetic inference. One of the most important aspects of model selection is to find a model that can account for variation in the substitution process among the sites of the alignment. This variation may include differences in rates of evolution, base frequencies, and substitution patterns, and the challenge is to account for all such variation found in any given dataset. Most importantly for this study, partitioning is still the most practical method with which to account for variation in rates and patterns of substitution in very large datasets. It is important that we work to ensure that partitioned models of molecular evolution are as accurate as possible, when they are applied to large datasets, and that is the focus of this study

Methods

Results

Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: BMC Evolutionary Biology	Publication Date: Jan 1, 2014
Citations: 636	License type: cc-by

R Discovery Prime

R Discovery Prime

Selecting optimal partitioning schemes for phylogenomic datasets.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: BMC Evolutionary Biology

Lead the way for us

Similar Papers

Phylogenetic Models of Molecular Evolution: Next-Generation Data, Fit, and Performance
David Posada
Journal of Molecular Evolution | VOL. 76
David PosadaDavid Posada
22 May 2013
Journal of Molecular Evolution | VOL. 76

PartitionFinder: Combined Selection of Partitioning Schemes and Substitution Models for Phylogenetic Analyses
R Lanfear ... S Y W Ho
Molecular Biology and Evolution | VOL. 29
R Lanfear, et. al.R Lanfear ... S Y W Ho
20 Jan 2012
Molecular Biology and Evolution | VOL. 29

Phylogenomics Controlling for Base Compositional Bias Reveals a Single Origin of Eusociality in Corbiculate Bees.
Jonathan Romiguier ... Laurent Keller
Molecular Biology and Evolution | VOL. 33
Jonathan Romiguier, et. al.Jonathan Romiguier ... Laurent Keller
17 Nov 2015
Molecular Biology and Evolution | VOL. 33

Approaches for parametrization of Markovian models of molecular evolution for protein-coding sequences

-

01 Jan 2015
01 Jan 2015

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Selecting optimal partitioning schemes for phylogenomic datasets.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: BMC Evolutionary Biology