Abstract

The resolution of the Tree of Life has accelerated with advances in DNA sequencing technology. To achieve dense taxon sampling, it is often necessary to obtain DNA from historical museum specimens to supplement modern genetic samples. However, DNA from historical material is generally degraded, which presents various challenges. In this study, we evaluated how the coverage at variant sites and missing data among historical and modern samples impacts phylogenomic inference. We explored these patterns in the brush-tongued parrots (lories and lorikeets) of Australasia by sampling ultraconserved elements in 105 taxa. Trees estimated with low coverage characters had several clades where relationships appeared to be influenced by whether the sample came from historical or modern specimens, which were not observed when more stringent filtering was applied. To assess if the topologies were affected by missing data, we performed an outlier analysis of sites and loci, and a data reduction approach where we excluded sites based on data completeness. Depending on the outlier test, 0.15% of total sites or 38% of loci were driving the topological differences among trees, and at these sites, historical samples had 10.9× more missing data than modern ones. In contrast, 70% data completeness was necessary to avoid spurious relationships. Predictive modeling found that outlier analysis scores were correlated with parsimony informative sites in the clades whose topologies changed the most by filtering. After accounting for biased loci and understanding the stability of relationships, we inferred a more robust phylogenetic hypothesis for lories and lorikeets.

Highlights

  • Historical and ancient DNA from museum specimens is widely employed for incorporating rare and extinct taxa into phylogenetic studies (e.g., Thomas et al 1989; Mitchell et al 2014; Fortes et al 2016)

  • We showed that systematic bias caused by missing informative sites between DNA sequences from modern versus historical specimens can produce aberrant or unstable phylogenetic relationships

  • Our analyses suggest that an asymmetry in phylogenetic information content among sample types is the primary culprit of the bias because only 3,142 sites (6.6% of total sites) drove the topological differences among trees, and the historical samples had 7.5x more missing data at these sites

Read more

Summary

Introduction

Historical and ancient DNA from museum specimens is widely employed for incorporating rare and extinct taxa into phylogenetic studies (e.g., Thomas et al 1989; Mitchell et al 2014; Fortes et al 2016). Shorter loci are potentially problematic because the sequence capture approach targets the invariable UCE core, limiting the portion of the flanking region that contains polymorphic sites Another factor that may cause differences among historical and modern samples is that phylogenomic pipelines that do not involve variant calling typically employ read-specific filtering, where the average read depth across all sites in a locus is used to determine whether the locus is excluded. Some studies only use DNA sequences collected from historical or ancient samples (e.g., Hung et al 2014), most phylogenetic approaches involving non-contemporaneous samples combine with those from modern samples For those that do use DNA from both sample types, additional challenges in downstream analyses may arise due to an asymmetry in the phylogenetic signal caused by non-random missing data (e.g., Hosner et al 2016)

Methods
Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call