Abstract

BackgroundRAD-seq is a powerful tool, increasingly used in population genomics. However, earlier studies have raised red flags regarding possible biases associated with this technique. In particular, polymorphism on restriction sites results in preferential sampling of closely related haplotypes, so that RAD data tends to underestimate genetic diversity.ResultsHere we (1) clarify the theoretical basis of this bias, highlighting the potential confounding effects of population structure and selection, (2) confront predictions to real data from in silico digestion of full genomes and (3) provide a proof of concept toward an ABC-based correction of the RAD-seq bias. Under a neutral and panmictic model, we confirm the previously established relationship between the true polymorphism and its RAD-based estimation, showing a more pronounced bias when polymorphism is high. Using more elaborate models, we show that selection, resulting in heterogeneous levels of polymorphism along the genome, exacerbates the bias and leads to a more pronounced underestimation. On the contrary, spatial genetic structure tends to reduce the bias. We confront the neutral and panmictic model to “ideal” empirical data (in silico RAD-sequencing) using full genomes from natural populations of the fruit fly Drosophila melanogaster and the fungus Shizophyllum commune, harbouring respectively moderate and high genetic diversity. In D. melanogaster, predictions fit the model, but the small difference between the true and RAD polymorphism makes this comparison insensitive to deviations from the model. In the highly polymorphic fungus, the model captures a large part of the bias but makes inaccurate predictions. Accordingly, ABC corrections based on this model improve the estimations, albeit with some imprecisions.ConclusionThe RAD-seq underestimation of genetic diversity associated with polymorphism in restriction sites becomes more pronounced when polymorphism is high. In practice, this means that in many systems where polymorphism does not exceed 2 %, the bias is of minor importance in the face of other sources of uncertainty, such as heterogeneous bases composition or technical artefacts. The neutral panmictic model provides a practical mean to correct the bias through ABC, albeit with some imprecisions. More elaborate ABC methods might integrate additional parameters, such as population structure and selection, but their opposite effects could hinder accurate corrections.Electronic supplementary materialThe online version of this article (doi:10.1186/s12862-016-0791-0) contains supplementary material, which is available to authorized users.

Highlights

  • Restriction Associated DNA sequencing (RAD-seq) is a powerful tool, increasingly used in population genomics

  • Partial corrections of the RAD-seq bias through Approximate Bayesian Computations (ABC) under a neutral panmictic model We explored the possibility of using simulations from the neutral panmictic model to correct, at least partially, the RAD-seq estimates of polymorphism, through Approximate Bayesian Computation

  • We first confirmed earlier findings based on simulations in a neutral and panmictic model: RAD-based estimates of diversity are lower than the true polymorphism, and this bias becomes more pronounced as the true polymorphism increases

Read more

Summary

Introduction

RAD-seq is a powerful tool, increasingly used in population genomics. earlier studies have raised red flags regarding possible biases associated with this technique. Reduced representation genomics aim at sequencing particular parts of the genomes of many individuals, rather than full genomes of one or a few individuals, in a single sequencing reaction One such approach, RAD-seq (and related protocols) makes use of restriction enzymes to Cariou et al BMC Evolutionary Biology (2016) 16:240 and polymorphism can themselves be linked (e.g. lower GC content in neutral and more polymorphic regions), this can impact diversity estimates. Particular motifs present in the restriction site might be enriched in some particular regions of the genomes (e.g. motifs corresponding to protein domains [7]) Such biases probably exist for any kind of molecular marker, because of the inherent contradiction between “targeted” and “random” sequencing. Individuals or haplotypes that are more closely related than the population average tend to share the same state at the restriction site (presence or absence), and are over-represented in RAD-seq datasets

Methods
Results
Discussion
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call