Abstract

Sequencing large number of individuals, which is often needed for population genetics studies, is still economically challenging despite falling costs of Next Generation Sequencing (NGS). Pool-seq is an alternative cost- and time-effective option in which DNA from several individuals is pooled for sequencing. However, pooling of DNA creates new problems and challenges for accurate variant call and allele frequency (AF) estimation. In particular, sequencing errors confound with the alleles present at low frequency in the pools possibly giving rise to false positive variants. We sequenced 996 individuals in 83 pools (12 individuals/pool) in a targeted re-sequencing experiment. We show that Pool-seq AFs are robust and reliable by comparing them with public variant databases and in-house SNP-genotyping data of individual subjects of pools. Furthermore, we propose a simple filtering guideline for the removal of spurious variants based on the Kolmogorov-Smirnov statistical test. We experimentally validated our filters by comparing Pool-seq to individual sequencing data showing that the filters remove most of the false variants while retaining majority of true variants. The proposed guideline is fairly generic in nature and could be easily applied in other Pool-seq experiments.

Highlights

  • One of the key interests of population genetics study is the information about polymorphic sites and corresponding allele frequency (AF) of variant alleles in the population

  • Comparing the distribution of quality for the rare variants reported in any of the 1000genomes, dbSNP, ExAC or ESP database (N in.db = 6359) with those not annotated in any public database (N novel = 12780), we found a disproportionate number of lower quality variants in the novel rare variant category [Fig. 5(a)]

  • Pool-seq can be successfully used as a cost-effective alternative to individual sequencing for population genetics studies

Read more

Summary

Introduction

One of the key interests of population genetics study is the information about polymorphic sites and corresponding AF of variant alleles in the population. Pool-seq should give more robust estimate of AF due to the larger sample size, which allows decreasing the overall variance of the estimated AF6. This hypothesis is well supported by mathematical models under the assumption that there are no sequencing errors and each individual contributes equal amount of DNA to the pools[7,8,9]. In the present study, involving targeted re-sequencing of 996 individuals in 83 pools, we show that Pool-seq can be used to accurately estimate AFs of variant alleles. By comparing Pool-seq with several public variant databases and SNP-array data of individuals constituting the pools, we show that the Pool-seq AFs are robust and reliable. We individually sequenced and identified variants for all subjects of a single pool and compared them with the results of Pool-seq, showing that the proposed filters provide a low rate of false positive and false negative variants, proving the utility and efficacy of the filters

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call