Phylogenetic analysis at deep timescales: Unreliable gene trees, bypassed hidden support, and the coalescence/concatalescence conundrum

John Gatesy,Mark S Springer

doi:10.1016/j.ympev.2014.08.013

Abstract

Large datasets are required to solve difficult phylogenetic problems that are deep in the Tree of Life. Currently, two divergent systematic methods are commonly applied to such datasets: the traditional supermatrix approach (= concatenation) and “shortcut” coalescence (= coalescence methods wherein gene trees and the species tree are not co-estimated). When applied to ancient clades, these contrasting frameworks often produce congruent results, but in recent phylogenetic analyses of Placentalia (placental mammals), this is not the case. A recent series of papers has alternatively disputed and defended the utility of shortcut coalescence methods at deep phylogenetic scales. Here, we examine this exchange in the context of published phylogenomic data from Mammalia; in particular we explore two critical issues – the delimitation of data partitions (“genes”) in coalescence analysis and hidden support that emerges with the combination of such partitions in phylogenetic studies. Hidden support – increased support for a clade in combined analysis of all data partitions relative to the support evident in separate analyses of the various data partitions, is a hallmark of the supermatrix approach and a primary rationale for concatenating all characters into a single matrix. In the most extreme cases of hidden support, relationships that are contradicted by all gene trees are supported when all of the genes are analyzed together. A valid fear is that shortcut coalescence methods might bypass or distort character support that is hidden in individual loci because small gene fragments are analyzed in isolation. Given the extensive systematic database for Mammalia, the assumptions and applicability of shortcut coalescence methods can be assessed with rigor to complement a small but growing body of simulation work that has directly compared these methods to concatenation. We document several remarkable cases of hidden support in both supermatrix and coalescence paradigms and argue that in most instances, the emergent support in the shortcut coalescence analyses is an artifact. By referencing rigorous molecular clock studies of Mammalia, we suggest that inaccurate gene trees that imply unrealistically deep coalescences debilitate shortcut coalescence analyses of the placental dataset. We document contradictory coalescence results for Placentalia, and outline a critical conundrum that challenges the general utility of shortcut coalescence methods at deep phylogenetic scales. In particular, the basic unit of analysis in coalescence analysis, the coalescence-gene, is expected to shrink in size as more taxa are analyzed, but as the amount of data for reconstruction of a gene tree ratchets downward, the number of nodes in the gene tree that need to be resolved ratchets upward. Some advocates of shortcut coalescence methods have attempted to address problems with inaccurate gene trees by concatenating multiple coalescence-genes to yield "gene trees" that better match the species tree. However, this hybrid concatenation/coalescence approach, “concatalescence,” contradicts the most basic biological rationale for performing a coalescence analysis in the first place. We discuss this reality in the context of recent simulation work that also suggests inaccurate reconstruction of gene trees is more problematic for shortcut coalescence methods than deep coalescence of independently segregating loci is for concatenation methods.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Phylogenetic analysis at deep timescales: Unreliable gene trees, bypassed hidden support, and the coalescence/concatalescence conundrum

Abstract

Talk to us

Similar Papers

More From: Molecular Phylogenetics and Evolution

Lead the way for us

Journal: Molecular Phylogenetics and Evolution	Publication Date: Aug 22, 2014
Citations: 408

Similar Papers

The gene tree delusion
Mark S Springer ... John Gatesy
Molecular Phylogenetics and Evolution | VOL. 94
Mark S Springer, et. al.Mark S Springer ... John Gatesy
31 Jul 2015
Molecular Phylogenetics and Evolution | VOL. 94

Estimation of Species Trees
Diego Mallo ... David Posada
-
Diego Mallo, et. al.Diego Mallo ... David Posada
14 Nov 2014
14 Nov 2014

Gene tree reconstruction and orthology analysis based on an integrated model for duplications and sequence evolution
Lars Arvestad ... Jens Lagergren
-
Lars Arvestad, et. al.Lars Arvestad ... Jens Lagergren
01 Jan 2004
01 Jan 2004

The inference of gene trees with species trees.
Gergely J Szöllősi ... Eric Tannier
Systematic Biology | VOL. 64
Gergely J Szöllősi, et. al.Gergely J Szöllősi ... Eric Tannier
28 Jul 2014
Systematic Biology | VOL. 64

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Phylogenetic analysis at deep timescales: Unreliable gene trees, bypassed hidden support, and the coalescence/concatalescence conundrum

Abstract

Talk to us

Similar Papers

More From: Molecular Phylogenetics and Evolution