Abstract

Genome-scale species tree inference is largely restricted to heuristic approaches that use estimated gene trees to reconstruct species-level relationships. Central to these heuristic species tree methods is the assumption that the gene trees are estimated without error. To increase the accuracy of input gene trees used to infer species trees, several techniques have recently been developed for constructing longer “supergenes” that represent sets of loci inferred to share the same genealogical history. While these supergene methods are designed to increase the amount of data for gene tree estimation by concatenating several loci into “supergenes” to increase gene tree accuracy, no formal protocols have been proposed to validate this key “supergene” concatenation step. In a recent study, we developed several supergene validation strategies for assessing the accuracy of a popular supergene method: the so-called “statistical binning” pipeline. In this article, we describe a more generalizable and model-based “supergene validation” protocol for assessing the accuracy of supergenes and supergene methods using model-based tests of phylogenetic congruency.•Supergenes are validated by adopting model-based tests of topological congruence•These model-based procedures out preform non-model based methods for supergene construction•The results of this protocol can be used to assess the overall performance of a supergene method across a phylogenomic dataset

Highlights

  • These approaches typically implement a two-part procedure whereby individual genealogical trees are first estimated for each genomic locus using maximum likelihood (ML) analyses, and the resulting gene tree estimates are used as input to reconstruct a species tree under the multispecies coalescent model using programs such as MPEST [1], ASRAL [2], ASTRID [3], or STEM [4]

  • While our primary goal is not to review in detail all possible phylogenetic tests that could be used for such a purpose, we provide several tools that proved useful for assessing supergene validation in our original study [15], and we mention additional techniques that could foreseeably be used for supergene validation in a similar manner

  • In our original demonstration of supergene validation using the avian phylogenomic analysis [15], we use the Likelihood Ratio Tests (LRTs) framework implemented in the program Concatepillar [26], which conducts a series of hierarchical LRTs to test the total number of distinct trees supported by the inferred supergene

Read more

Summary

Introduction

For the purpose of this article and to provide the same context as our original supergene validation study [15], we primarily discuss the use of our supergene validation protocol for assessing the accuracy of the statistical binning method that was used to infer supergenes for the avian phylogenomic analyses [11,21,22,23]. After a supergene (or set of supergenes) has been inferred (either based on a priori assumptions or via a more formal supergene method; Fig. 1a), the goal is to test whether the individual loci placed within a supergene should be treated as a single concatenated locus with a single phylogenetic tree topology (i.e., “true supergene”) or not (i.e., “false supergene”).

Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.