Abstract

Sequencing of pooled samples (Pool-Seq) using next-generation sequencing technologies has become increasingly popular, because it represents a rapid and cost-effective method to determine allele frequencies for single nucleotide polymorphisms (SNPs) in population pools. Validation of allele frequencies determined by Pool-Seq has been attempted using an individual genotyping approach, but these studies tend to use samples from existing model organism databases or DNA stores, and do not validate a realistic setup for sampling natural populations. Here we used pyrosequencing to validate allele frequencies determined by Pool-Seq in three natural populations of Arabidopsis halleri (Brassicaceae). The allele frequency estimates of the pooled population samples (consisting of 20 individual plant DNA samples) were determined after mapping Illumina reads to (i) the publicly available, high-quality reference genome of a closely related species (Arabidopsis thaliana) and (ii) our own de novo draft genome assembly of A. halleri. We then pyrosequenced nine selected SNPs using the same individuals from each population, resulting in a total of 540 samples. Our results show a highly significant and accurate relationship between pooled and individually determined allele frequencies, irrespective of the reference genome used. Allele frequencies differed on average by less than 4%. There was no tendency that either the Pool-Seq or the individual-based approach resulted in higher or lower estimates of allele frequencies. Moreover, the rather high coverage in the mapping to the two reference genomes, ranging from 55 to 284x, had no significant effect on the accuracy of the Pool-Seq. A resampling analysis showed that only very low coverage values (below 10-20x) would substantially reduce the precision of the method. We therefore conclude that a pooled re-sequencing approach is well suited for analyses of genetic variation in natural populations.

Highlights

  • During the past few years, the use of next-generation sequencing technologies (NGS, [1]) has dramatically increased

  • Futschik and Schlötterer [6] demonstrated that sequencing of pooled samples (Pool-Seq) using NGS reduces costs and workload, but can reliably detect single nucleotide polymorphisms (SNPs) and accurately estimate various population genomic parameters

  • Our results show that estimates of allele frequencies measured in the pooled samples are highly accurate, independent of the reference genome used for mapping, and that a Pool-Seq approach is well suited for population genomic analyses of natural populations

Read more

Summary

Introduction

During the past few years, the use of next-generation sequencing technologies (NGS, [1]) has dramatically increased. NGS allows studies to expand to a truly genome-wide scale, potentially analyzing millions of polymorphisms at a time (e.g., [2,4,5]). A pooled approach using samples consisting of several individuals might be preferable in many study designs. Futschik and Schlötterer [6] demonstrated that sequencing of pooled samples (Pool-Seq) using NGS reduces costs and workload, but can reliably detect single nucleotide polymorphisms (SNPs) and accurately estimate various population genomic parameters. Pool-Seq requires that individual DNA samples are combined in equimolar concentrations, and measurement and/or pipetting errors might have a negative impact on allele frequency estimates, at least in small pools [6]

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call