Abstract
Rapidly growing biological data—including molecular sequences and fossils—hold an unprecedented potential to reveal how evolutionary processes generate and maintain biodiversity. However, researchers often have to develop their own idiosyncratic workflows to integrate and analyze these data for reconstructing time-calibrated phylogenies. In addition, divergence times estimated under different methods and assumptions, and based on data of various quality and reliability, should not be combined without proper correction. Here we introduce a modular framework termed SUPERSMART (Self-Updating Platform for Estimating Rates of Speciation and Migration, Ages, and Relationships of Taxa), and provide a proof of concept for dealing with the moving targets of evolutionary and biogeographical research. This framework assembles comprehensive data sets of molecular and fossil data for any taxa and infers dated phylogenies using robust species tree methods, also allowing for the inclusion of genomic data produced through next-generation sequencing techniques. We exemplify the application of our method by presenting phylogenetic and dating analyses for the mammal order Primates and for the plant family Arecaceae (palms). We believe that this framework will provide a valuable tool for a wide range of hypothesis-driven research questions in systematics, biogeography, and evolution. SUPERSMART will also accelerate the inference of a “Dated Tree of Life” where all node ages are directly comparable.
Highlights
(PHLAWD, http://phlawd.net/) (Smith et al 2009). This pipeline adds candidate sequences to a user-provided set of seed sequences, provided that their reciprocal BLAST hit overlap is sufficient. This results in data sets that are taxonomically broader than those obtained by PhyLoTA, a drawback is that under this approach only requested markers are collected—meaning that no unrequested regions are retrieved even if they contain phylogenetic information
Anyone who seeks to download 16S sequences from GenBank will encounter a nearendless array of orthographic and conceptual variations such as “16 S,” “16S,” “17S,” “SSU,” “ribosomal small subunit,” and “ribosomal small sub-unit”. Workflows such as those implemented in PhyLoTA or PHLAWD are useful for assembling multiple sequence alignments, they do not by themselves create multilocus supermatrices with optimally broad taxon coverage
Two main approaches have been developed to take advantage of the sequencing and phylogenetic efforts made so far, both of which have the capacity to handle very large numbers of terminal taxa: (i) supertrees, which involve the fusion of separate trees with at least some degree of taxonomic overlap, under parsimony, maximum likelihood, or Bayesian approaches (e.g., Bininda-Emonds et al 1999; Nguyen et al 2012, and references therein); and (ii) supermatrices, which are data sets containing sets of markers that share at least some taxa
Summary
(PHLAWD, http://phlawd.net/) (Smith et al 2009). This pipeline adds candidate sequences (identified by querying GenBank records for user-specified gene name annotations) to a user-provided set of seed sequences, provided that their reciprocal BLAST hit overlap is sufficient (this latter step is comparable to how PhyLoTA filters candidate cluster members). The package allows users to generate custom-made sets of robustly inferred, dated trees for further analyses, or to assemble aligned DNA data sets representing optimal combinations of sequenced genes/markers and taxa (see Fig. 1 for a comparison of SUPERSMART with supertree and supermatrix approaches). As the standard goal of SUPERSMART is to infer species-level timecalibrated trees ( lower taxonomic levels are supported), these clusters of sequences are reduced to more manageable data sets, containing approximately equal numbers of sequences for each species.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.