Abstract

PREMISEUniversal target enrichment kits maximize utility across wide evolutionary breadth while minimizing the number of baits required to create a cost‐efficient kit. The Angiosperms353 kit has been successfully used to capture loci throughout the angiosperms, but the default target reference file includes sequence information from only 6–18 taxa per locus. Consequently, reads sequenced from on‐target DNA molecules may fail to map to references, resulting in fewer on‐target reads for assembly, and reducing locus recovery.METHODSWe expanded the Angiosperms353 target file, incorporating sequences from 566 transcriptomes to produce a ‘mega353’ target file, with each locus represented by 17–373 taxa. This mega353 file is a drop‐in replacement for the original Angiosperms353 file in HybPiper analyses. We provide tools to subsample the file based on user‐selected taxon groups, and to incorporate other transcriptome or protein‐coding gene data sets.RESULTSCompared to the default Angiosperms353 file, the mega353 file increased the percentage of on‐target reads by an average of 32%, increased locus recovery at 75% length by 49%, and increased the total length of the concatenated loci by 29%.DISCUSSIONIncreasing the phylogenetic density of the target reference file results in improved recovery of target capture loci. The mega353 file and associated scripts are available at: https://github.com/chrisjackson‐pellicle/NewTargets.

Highlights

  • PREMISE: Universal target enrichment kits maximize utility across wide evolutionary breadth while minimizing the number of baits required to create a cost-­efficient kit

  • Sequence number and phylogenetic density in the default353 target file compared to the mega353 target file

  • In terms of improvement in phylogenetic density, the default353 target file has an average of 13.5 orders and 13.5 families per locus, whereas the mega353 target file has an average of 49.8 orders and 170 families per locus

Read more

Summary

METHODS

To tailor the large mega353 target file to investigation-­specific taxon sampling, we include the script filter_megatarget.py This script can be used to create a filtered target file based on user-s­ elected taxa or taxon groups, defined by unique 1KP transcriptome codes, families, orders, or clades (see https://github.com/chrisjackson-­pellicle/ NewTargets for full options). The input required for the script BYO_transcriptome.py is a target file and a directory of transcriptomes and/or nucleotide sequences corresponding to protein-­coding genes, and it can be used to expand target files from other bait kits. The Asteraceae target file (comprising http://www.wileyonlinelibrary.com/journal/AppsPlantSci. McLay et al.—New targets for the Angiosperms353 probe set 5 of 9 only the H. annuus and L. sativa target sequences) was expanded using 1KP transcriptomes of taxa closely related to Asteraceae tribe Gnaphalieae (Appendix S1). The Hibisceae target file was expanded using available sequence data from the other Malvaceae subfamily Malvoideae tribes, Malveae and Gossypieae (Appendix S2)

DISCUSSION
RESULTS

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.