Abstract
The sequencing of pooled non-barcoded individuals is an inexpensive and efficient means of assessing genome-wide population allele frequencies, yet its accuracy has not been thoroughly tested. We assessed the accuracy of this approach on whole, complex eukaryotic genomes by resequencing pools of largely isogenic, individually sequenced Drosophila melanogaster strains. We called SNPs in the pooled data and estimated false positive and false negative rates using the SNPs called in individual strain as a reference. We also estimated allele frequency of the SNPs using “pooled” data and compared them with “true” frequencies taken from the estimates in the individual strains. We demonstrate that pooled sequencing provides a faithful estimate of population allele frequency with the error well approximated by binomial sampling, and is a reliable means of novel SNP discovery with low false positive rates. However, a sufficient number of strains should be used in the pooling because variation in the amount of DNA derived from individual strains is a substantial source of noise when the number of pooled strains is low. Our results and analysis confirm that pooled sequencing is a very powerful and cost-effective technique for assessing of patterns of sequence variation in populations on genome-wide scales, and is applicable to any dataset where sequencing individuals or individual cells is impossible, difficult, time consuming, or expensive.
Highlights
Efficient assessment of presence and frequencies of singlenucleotide polymorphisms (SNP) in populations is vital to answering key problems in genetics and population biology
We investigated the effects of mapping strategies, read depth, unequal DNA contribution, and reproducibility of the technique with regards to the accuracy of population allele frequency estimation from pooled sequencing
We looked at DGRP SNP positions that were covered to . = 106 read depth in library A and compared DGRP allele frequency estimates to those from library A
Summary
Efficient assessment of presence and frequencies of singlenucleotide polymorphisms (SNP) in populations is vital to answering key problems in genetics and population biology. Inference of demographic history, identification of causative loci affecting a trait of interest, discovery of cancercausing mutations in mixed pools of cells, or the search for evidence of natural selection in the genome all require knowledge of the frequency spectra in groups of individuals or cells. Individually sequencing dozens of individuals from each population is often more costly and labor intensive. Multiplexing techniques allow a more efficient use of sequencing resources but still require a large number of individual DNA extractions, manipulations of reagents, barcoding oligos, PCR reactions, and sequencing library constructions. Pooling individuals prior to DNA extraction and sequencing the pooled DNA without barcodes can generate an inexpensive and efficient assessment of allele frequencies genome-wide
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.