Abstract

Phylogenetic datasets are now commonly generated using short-read sequencing technologies unhampered by degraded DNA, such as that often extracted from herbarium specimens. The compatibility of these methods with herbarium specimens has precipitated an increase in broad sampling of herbarium specimens for inclusion in phylogenetic studies. Understanding which sample characteristics are predictive of sequencing success can guide researchers in the selection of tissues and specimens most likely to yield good results. Multiple recent studies have considered the relationship between sample characteristics and DNA yield and sequence capture success. Here we report an analysis of the relationship between sample characteristics and sequencing success for nearly 8,000 herbarium specimens. This study, the largest of its kind, is also the first to include a measure of specimen quality (“greenness”) as a predictor of DNA sequencing success. We found that taxonomic group and source herbarium are strong predictors of both DNA yield and sequencing success and that the most important specimen characteristics for predicting success differ for DNA yield and sequencing: greenness was the strongest predictor of DNA yield, and age was the strongest predictor of proportion-on-target reads recovered. Surprisingly, the relationship between age and proportion-on-target reads is the inverse of expectations; older specimens performed slightly better in our capture-based protocols. We also found that DNA yield itself is not a strong predictor of sequencing success. Most literature on DNA sequencing from herbarium specimens considers specimen selection for optimal DNA extraction success, which we find to be an inappropriate metric for predicting success using next-generation sequencing technologies.

Highlights

  • Herbarium specimens and short-read Next-Generation Sequencing (NGS) are natural partners

  • The dataset we present comprises 7,608 specimens that are a random subset of the 15,000 herbarium specimens collected as part of the Nitfix project, a study that includes ∼50% of the species diversity of the nitrogen-fixing clade

  • Herbarium, which was the most difficult to control, was most predictive of DNA yield and sequencing success (Supplementary Table 2). These results suggest that for projects that massively sample across wide phylogenetic breadth it may not be worth prioritizing specimen quality factors over adjustments to broader project design that could promote sequencing success; this is rarely a tradeoff that can be considered because selection of plant families and herbarium collections are determined by project goals and sampling efficiencies

Read more

Summary

Introduction

Herbarium specimens and short-read Next-Generation Sequencing (NGS) are natural partners. Short-read NGS methods are compatible with low input DNA quantities and fragmented DNA molecules that may result from specimen degradation over time (Pyle and Adams, 1989; Savolainen et al, 1995). The broader adoption of short-read NGS methods by much of the phylogenetics community repositions herbarium collections as the primary source for generation of low-copy nuclear DNA datasets (Hart et al, 2016; Zeng et al, 2018; Brewer et al, 2019), enabling comprehensive and global-scale plant phylogenies and other molecular applications (e.g., Brewer et al, 2019; Folk et al, 2021)

Methods
Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call