Large Phylogenies Research Articles

Abstract Phylogenies with extensive taxon sampling have become indispensable for many types of ecological and evolutionary studies. Many large‐scale trees are based on a ‘supermatrix’ approach, which involves amalgamating thousands of published sequences for a group. Constructing up‐to‐date supermatrices can be challenging, especially as new sequences may become available almost constantly. Additionally, genomic datasets (composed of thousands of loci) are becoming common in phylogenetics and phylogeography, and present novel challenges for constructing such datasets. Here we present SuperCRUNCH, a Python toolkit for assembling large phylogenetic datasets. It can be applied to GenBank sequences, unpublished sequences or combinations of GenBank and unpublished data. SuperCRUNCH constructs local databases and uses them to conduct rapid searches for user‐specified sets of taxa and loci. Sequences are parsed into putative loci and passed through rigorous filtering steps. A post‐filtering step allows for selection of one sequence per taxon (i.e. species‐level supermatrix) or retention of all sequences per taxon (i.e. population‐level dataset). Importantly, SuperCRUNCH can generate ‘vouchered’ population‐level datasets, in which voucher information is used to generate multi‐locus phylogeographic datasets. SuperCRUNCH offers many options for taxonomy resolution, similarity filtering, sequence selection, alignment and file manipulation. We demonstrate the range of features available in SuperCRUNCH by generating a variety of phylogenetic datasets. Output datasets include traditional species‐level supermatrices, large‐scale phylogenomic matrices and phylogeographic datasets. Finally, we briefly compare the ability of SuperCRUNCH to construct species‐level supermatrices relative to alternative approaches. SuperCRUNCH generated a large‐scale supermatrix (1,400 taxa and 66 loci) from 16 GB of GenBank data in ~1.5 hr, and generated population‐level datasets (<350 samples, <10 loci) in <1 min. It outperformed alternative methods for supermatrix construction in terms of taxa, loci and sequences recovered. SuperCRUNCH is a modular bioinformatics toolkit that can be used to assemble datasets for any taxonomic group and scale (kingdoms to individuals). It allows rapid construction of supermatrices, greatly simplifying the process of updating large phylogenies with new data. It is also designed to produce population‐level datasets. SuperCRUNCH streamlines the major tasks required to process phylogenetic data, including filtering, alignment, trimming and formatting. SuperCRUNCH is open‐source, documented and available at https://github.com/dportik/SuperCRUNCH.

The gaudy grasshopper family Pyrgomorphidae (Orthoptera: Caelifera) shows a peculiar geographical distribution. Of the 487 described species, less than 10% of the diversity is found in the New World, while the rest occur throughout Africa, Asia, and Australia. Only 41 species belonging to four tribes are found in Central and South America and Dominican Republic, and the phylogenetic positions of these taxa within the large phylogeny of Pyrgomorphidae and the relationships among them have never been investigated. Regarding the biogeography, three different hypotheses about the origin of the New World Pyrgomorphidae have been proposed, but these have not been empirically tested. In this study, we present the first molecular phylogeny of Pyrgomorphidae that includes the members of all four New World tribes and representative genera from the Old World based on entire mitochondrial genome and four nuclear genes to investigate the biogeography of this fascinating lineage. Our results recover Pyrgomorphidae as monophyletic and the New World Pyrgomorphidae as a paraphyletic group comprising three clades, consisting of: (1) The Caribbean Jaragua + the South American Algete; (2) The Mexican and Central American Sphenarium + Prosphena; and (3) The Mexican lineages Ichthiacridini + Ichthyotettigini. The divergence time estimation analysis suggested that the Pyrgomorphidae diverged from its relatives in the Early Cretaceous (139–104 mya). The biogeographic analysis using BioGeoBEARS showed that after diversifying in the Old World, the first New World Pyrgomorphidae clade (Algete + Jaragua) diverged 96 mya (Late Cretaceous, Cenomanian) and that their current distribution in the New World is explained by two possible events, a transatlantic colonization from Africa to Northern South America or a vicariance event between these two landmasses, followed by a subsequent dispersal to the Caribbean. The second wave of colonization occurred about 69 mya towards the end of the Late Cretaceous (Maastrichtian) with dispersal from Africa to South America and then to North America with a subsequent diversification in Mexico including Baja California.

Large Phylogenies Research Articles

Related Topics

Articles published on Large Phylogenies

SuperCRUNCH: A bioinformatics toolkit for creating and manipulating supermatrices and other large phylogenetic datasets

Convergent adaptation of the genomes of woody plants at the land-sea interface.

Identification of Hidden Population Structure in Time-Scaled Phylogenies.

Simulating trees with millions of species.

A General and Efficient Algorithm for the Likelihood of Diversification and Discrete-Trait Evolutionary Models.

On the origin of the New World Pyrgomorphidae (Insecta: Orthoptera)

Tree reconciliation combined with subsampling improves large scale inference of orthologous group hierarchies

Distribution and asymptotic behavior of the phylogenetic transfer distance

V.PhyloMaker: an R package that can generate very large phylogenies for vascular plants

A Machine Learning Method for Detecting Autocorrelation of Evolutionary Rates in Large Phylogenies

Predicting the Impact of Describing New Species on Phylogenetic Patterns.

Phylogenies and Diversification Rates: Variance Cannot Be Ignored.

High-Throughput Reconstruction of Ancestral Protein Sequence, Structure, and Molecular Function.

Bacterial diversification through geological time.

Corrigendum for Rojas etal. (2018) DOI: 10.1111/ele.12911.

BCD Beam Search: considering suboptimal partial solutions in Bad Clade Deletion supertrees.

Lorenzo Camerano (1856–1917) and his contribution to large mammal phylogeny and taxonomy, with particular reference to the genera Capra, Rupicapra and Rangifer

Can we build it? Yes we can, but should we use it? Assessing the quality and value of a very large phylogeny of campanulid angiosperms

Chromploid: An R package for chromosome number evolution across the plant tree of life.

Constructing a broadly inclusive seed plant phylogeny.

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Large Phylogenies Research Articles

Related Topics

Articles published on Large Phylogenies

SuperCRUNCH: A bioinformatics toolkit for creating and manipulating supermatrices and other large phylogenetic datasets

Convergent adaptation of the genomes of woody plants at the land-sea interface.

Identification of Hidden Population Structure in Time-Scaled Phylogenies.

Simulating trees with millions of species.

A General and Efficient Algorithm for the Likelihood of Diversification and Discrete-Trait Evolutionary Models.

On the origin of the New World Pyrgomorphidae (Insecta: Orthoptera)

Tree reconciliation combined with subsampling improves large scale inference of orthologous group hierarchies

Distribution and asymptotic behavior of the phylogenetic transfer distance

V.PhyloMaker: an R package that can generate very large phylogenies for vascular plants

A Machine Learning Method for Detecting Autocorrelation of Evolutionary Rates in Large Phylogenies

Predicting the Impact of Describing New Species on Phylogenetic Patterns.

Phylogenies and Diversification Rates: Variance Cannot Be Ignored.

High-Throughput Reconstruction of Ancestral Protein Sequence, Structure, and Molecular Function.

Bacterial diversification through geological time.

Corrigendum for Rojas etal. (2018) DOI: 10.1111/ele.12911.

BCD Beam Search: considering suboptimal partial solutions in Bad Clade Deletion supertrees.

Lorenzo Camerano (1856–1917) and his contribution to large mammal phylogeny and taxonomy, with particular reference to the genera Capra, Rupicapra and Rangifer

Can we build it? Yes we can, but should we use it? Assessing the quality and value of a very large phylogeny of campanulid angiosperms

Chromploid: An R package for chromosome number evolution across the plant tree of life.

Constructing a broadly inclusive seed plant phylogeny.