Abstract
Until now, the potential of NGS for the construction of barcode libraries or integrative taxonomy has been seldom realised. Here, we amplified (two-step PCR) and simultaneously sequenced (MiSeq) multiple markers from hundreds of fig wasp specimens. We also developed a workflow for quality control of the data. Illumina and Sanger sequences accumulated in the past years were compared. Interestingly, primers and PCR conditions used for the Sanger approach did not require optimisation to construct the MiSeq library. After quality controls, 87% of the species (76% of the specimens) had a valid MiSeq sequence for each marker. Importantly, major clusters did not always correspond to the targeted loci. Nine specimens exhibited two divergent sequences (up to 10%). In 95% of the species, MiSeq and Sanger sequences obtained from the same sampling were similar. For the remaining 5%, species were paraphyletic or the sequences clustered into divergent groups on the Sanger + MiSeq trees (>7%). These problematic cases may represent coding NUMTS or heteroplasms. Our results illustrate that Illumina approaches are not artefact-free and confirm that Sanger databases can contain non-target genes. This highlights the importance of quality controls, working with taxonomists and using multiple markers for DNA-taxonomy or species diversity assessment.
Highlights
Until now, the potential of next-generation sequencing (NGS) for the construction of barcode libraries or integrative taxonomy has been seldom realised
While next-generation sequencing (NGS) is commonly used to analyse bulk environmental samples[1,2,3], Sanger sequencing remains the standard approach in generating DNA barcode libraries[4]
A polymerase chain reactions (PCR) amplification product was observed for 80.9% of the species for c oxidase subunit I (COI)-long, 86.1% for COI-short, 85.2% for cytochrome b (Cytb), and 77.4% for EF
Summary
The potential of NGS for the construction of barcode libraries or integrative taxonomy has been seldom realised. Our results illustrate that Illumina approaches are not artefact-free and confirm that Sanger databases can contain non-target genes This highlights the importance of quality controls, working with taxonomists and using multiple markers for DNA-taxonomy or species diversity assessment. While next-generation sequencing (NGS) is commonly used to analyse bulk environmental samples (metabarcoding)[1,2,3], Sanger sequencing remains the standard approach in generating DNA barcode libraries[4] This is unfortunate as the cost-effective acquisition of barcode sequences from hundreds of specimens identified to species by expert taxonomists could accelerate the construction of accurate reference libraries and increase their completeness[2,5].
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.