SPEDE-sampler: An R Shiny application to assess how methodological choices and taxon sampling can affect Generalized Mixed Yule Coalescent output and interpretation.

Clarke J M Steenderen,Guy F Sutton

doi:10.1111/1755-0998.13591

Abstract

Species delimitation tools are vital to taxonomy and the discovery of new species. These tools can make use of genetic data to estimate species boundaries, where one of the most widely used methods is the Generalized Mixed Yule Coalescent (GMYC) model. Despite its popularity, a number of factors are known to influence the performance and resulting inferences of the GMYC. Moreover, the few studies that have assessed model performance to date have been predominantly based on simulated data sets, where model assumptions are not violated. Here, we present a user‐friendly R Shiny application, ‘SPEDE‐sampler’ (SPEcies DElimitation sampler), that assesses the effect of computational and methodological choices, in combination with sampling effects, on the GMYC model. Output phylogenies are used to test the effect that (1) sample size, (2) BEAST and GMYC parameters (e.g. prior settings, single vs multiple threshold, clock model), and (3) singletons have on GMYC output. Optional predefined grouping information (e.g. morphospecies/ecotypes) can be uploaded in order to compare it with GMYC species and estimate percentage match scores. Additionally, predefined groups that contribute to inflated species richness estimates are identified by SPEDE‐sampler, allowing for the further investigation of potential cryptic species or geographical substructuring in those groups. Merging by the GMYC is also recorded to identify where traditional taxonomy has overestimated species numbers. Four worked examples are provided to illustrate the functionality of the program's workflow, and the variation that can arise when applying the GMYC model to empirical data sets. The R Shiny program is available for download at https://github.com/clarkevansteenderen/spede_sampler_R.

Full Text