Abstract

High-throughput sequencing using pooled DNA samples can facilitate genome-wide studies on rare and low-frequency variants in a large population. Some major questions concerning the pooling sequencing strategy are whether rare and low-frequency variants can be detected reliably, and whether estimated minor allele frequencies (MAFs) can represent the actual values obtained from individually genotyped samples. In this study, we evaluated MAF estimates using three variant detection tools with two sets of pooled whole exome sequencing (WES) and one set of pooled whole genome sequencing (WGS) data. Both GATK and Freebayes displayed high sensitivity, specificity and accuracy when detecting rare or low-frequency variants. For the WGS study, 56% of the low-frequency variants in Illumina array have identical MAFs and 26% have one allele difference between sequencing and individual genotyping data. The MAF estimates from WGS correlated well (r = 0.94) with those from Illumina arrays. The MAFs from the pooled WES data also showed high concordance (r = 0.88) with those from the individual genotyping data. In conclusion, the MAFs estimated from pooled DNA sequencing data reflect the MAFs in individually genotyped samples well. The pooling strategy can thus be a rapid and cost-effective approach for the initial screening in large-scale association studies.

Highlights

  • Of genetic variants underlying Mendelian diseases[6,7,8], and there is strong interest in extending the application of NGS to complex traits[9]

  • We compared the variants detected by three different tools (SAMtools, Genome Analysis Toolkit (GATK) and Freebayes) and evaluated the accuracy of minor allele frequencies (MAFs) estimates from pooled DNA sequencing data by comparing them with the MAFs obtained from individual genotyping data

  • The difference in the numbers of single nucleotide variant (SNV) was due to novel variants that were not annotated in dbSNP, or rare variants with alternative allele frequency (AAF) less than 1%

Read more

Summary

Introduction

Of genetic variants underlying Mendelian diseases[6,7,8], and there is strong interest in extending the application of NGS to complex traits[9]. Whole genome sequencing (WGS) and whole exome sequencing (WES)[10,11] are becoming increasingly popular because of their wide coverage and single-base resolution. These techniques are still costly, laborious and time-consuming for most laboratories involved in population-based association studies. To capture rare variants related to complex diseases, the ideal approach is to sequence every individual sample in a very large cohort[12]. In addition to enabling the identification of rare variants in candidate genes[13,14], certain pooled DNA sequencing studies at the whole exome scale have reported low-frequency variants associated with complex diseases[15,16]. We compared the variants detected by three different tools (SAMtools, GATK and Freebayes) and evaluated the accuracy of MAF estimates from pooled DNA sequencing data by comparing them with the MAFs obtained from individual genotyping data

Methods
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.