Abstract
BackgroundHerbaria are valuable sources of extensive curated plant material that are now accessible to genetic studies because of advances in high-throughput, next-generation sequencing methods. As an applied assessment of large-scale recovery of plastid and ribosomal genome sequences from herbarium material for plant identification and phylogenomics, we sequenced 672 samples covering 21 families, 142 genera and 530 named and proposed named species. We explored the impact of parameters such as sample age, DNA concentration and quality, read depth and fragment length on plastid assembly error. We also tested the efficacy of DNA sequence information for identifying plant samples using 45 specimens recently collected in the Pilbara.ResultsGenome skimming was effective at producing genomic information at large scale. Substantial sequence information on the chloroplast genome was obtained from 96.1% of samples, and complete or near-complete sequences of the nuclear ribosomal RNA gene repeat were obtained from 93.3% of samples. We were able to extract sequences for the core DNA barcode regions rbcL and matK from 96 to 93.3% of samples, respectively. Read quality and DNA fragment length had significant effects on sequencing outcomes and error correction of reads proved essential. Assembly problems were specific to certain taxa with low GC and high repeat content (Goodenia, Scaevola, Cyperus, Bulbostylis, Fimbristylis) suggesting biological rather than technical explanations. The structure of related genomes was needed to guide the assembly of repeats that exceeded the read length. DNA-based matching proved highly effective and showed that the efficacy for species identification declined in the order cpDNA >> rDNA > matK >> rbcL.ConclusionsWe showed that a large-scale approach to genome sequencing using herbarium specimens produces high-quality complete cpDNA and rDNA sequences as a source of data for DNA barcoding and phylogenomics.
Highlights
Herbaria are valuable sources of extensive curated plant material that are accessible to genetic studies because of advances in high-throughput, next-generation sequencing methods
We demonstrated that a large scale approach to genome sequencing of herbarium specimens can produce a large dataset of complete chloroplast DNA (cpDNA) and rDNA sequences, and that the data generated can be used for species identification and phylogenomics
In this study, we have shown that we can readily produce at scale, whole chloroplast and ITS rDNA data from herbarium specimens that can be used for a range of applications
Summary
Herbaria are valuable sources of extensive curated plant material that are accessible to genetic studies because of advances in high-throughput, next-generation sequencing methods. Large scale studies with broad taxonomic sampling are lacking but needed given the future importance of herbaria for the systematic development of reference barcode databases [2]. This project used recent developments in full genome sequencing to provide a DNA sequence database of a key set of the Pilbara flora, and provides a proof of concept as an initial stage in the development of effective large scale, DNA-based species identification system for the Pilbara bioregion. Development of an improved knowledge base for the Pilbara flora will deliver improved reliability and efficiency of plant identifications for environmental impact assessments and associated regulatory land use planning approval processes
Published Version (Free)
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have