Abstract

BackgroundRestriction site-associated DNA sequencing (RADseq) has revolutionized the study of wild organisms by allowing cost-effective genotyping of thousands of loci. However, for species lacking reference genomes, it can be challenging to select the restriction enzyme that offers the best balance between the number of obtained RAD loci and depth of coverage, which is crucial for a successful outcome. To address this issue, PredRAD was recently developed, which uses probabilistic models to predict restriction site frequencies from a transcriptome assembly or other sequence resource based on either GC content or mono-, di- or trinucleotide composition. This program generates predictions that are broadly consistent with estimates of the true number of restriction sites obtained through in silico digestion of available reference genome assemblies. However, in practice the actual number of loci obtained could potentially differ as incomplete enzymatic digestion or patchy sequence coverage across the genome might lead to some loci not being represented in a RAD dataset, while erroneous assembly could potentially inflate the number of loci. To investigate this, we used genome and transcriptome assemblies together with RADseq data from the Antarctic fur seal (Arctocephalus gazella) to compare PredRAD predictions with empirical estimates of the number of loci obtained via in silico digestion and from de novo assemblies.ResultsPredRAD yielded consistently higher predicted numbers of restriction sites for the transcriptome assembly relative to the genome assembly. The trinucleotide and dinucleotide models also predicted higher frequencies than the mononucleotide or GC content models. Overall, the dinucleotide and trinucleotide models applied to the transcriptome and the genome assemblies respectively generated predictions that were closest to the number of restriction sites estimated by in silico digestion. Furthermore, the number of de novo assembled RAD loci mapping to restriction sites was similar to the expectation based on in silico digestion.ConclusionsOur study reveals generally high concordance between PredRAD predictions and empirical estimates of the number of RAD loci. This further supports the utility of PredRAD, while also suggesting that it may be feasible to sequence and assemble the majority of RAD loci present in an organism’s genome.

Highlights

  • Restriction site-associated DNA sequencing (RADseq) has revolutionized the study of wild organisms by allowing cost-effective genotyping of thousands of loci

  • In order to explore why only a proportion of the de novo assembled RAD loci successfully mapped to the reference genome, we focused on the de novo assembly using default parameters and regressed the number of samples in which each locus was assembled on mapping probability

  • The two predictions that were closest to the Discussion One of the most important factors affecting the success of a Restriction site associated DNA sequencing (RADseq) project is the choice of which restriction enzyme to use, as this determines the number of RAD loci that will be obtained and the depth of sequencing coverage that should result from a given sequencing effort

Read more

Summary

Introduction

Restriction site-associated DNA sequencing (RADseq) has revolutionized the study of wild organisms by allowing cost-effective genotyping of thousands of loci. For species lacking reference genomes, it can be challenging to select the restriction enzyme that offers the best balance between the number of obtained RAD loci and depth of coverage, which is crucial for a successful outcome To address this issue, PredRAD was recently developed, which uses probabilistic models to predict restriction site frequencies from a transcriptome assembly or other sequence resource based on either GC content or mono-, di- or trinucleotide composition. Assuming complete digestion and adequately deep and even sequencing across loci, the number of assembled RAD loci should be roughly twice the number of restriction sites, as enzymatic digestion generates two fragments that are both partially sequenced (Fig. 1) Based on this number, it is possible to calculate the amount of sequencing required to obtain a specified average depth of coverage and from there to design an optimal sequencing strategy given the available budget

Methods
Results
Discussion
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.