Abstract

BackgroundThe reduction in the cost of sequencing a human genome has led to the use of genotype sampling strategies in order to impute and infer the presence of sequence variants that can then be tested for associations with traits of interest. Low-coverage Whole Genome Sequencing (WGS) is a sampling strategy that overcomes some of the deficiencies seen in fixed content SNP array studies. Linkage-disequilibrium (LD) aware variant callers, such as the program Thunder, may provide a calling rate and accuracy that makes a low-coverage sequencing strategy viable.ResultsWe examined the performance of an LD-aware variant calling strategy in a population of 708 low-coverage whole genome sequences from a community sample of Native Americans. We assessed variant calling through a comparison of the sequencing results to genotypes measured in 641 of the same subjects using a fixed content first generation exome array. The comparison was made using the variant calling routines GATK Unified Genotyper program and the LD-aware variant caller Thunder. Thunder was found to improve concordance in a coverage dependent fashion, while correctly calling nearly all of the common variants as well as a high percentage of the rare variants present in the sample.ConclusionsLow-coverage WGS is a strategy that appears to collect genetic information intermediate in scope between fixed content genotyping arrays and deep-coverage WGS. Our data suggests that low-coverage WGS is a viable strategy with a greater chance of discovering novel variants and associations than fixed content arrays for large sample association analyses.

Highlights

  • The reduction in the cost of sequencing a human genome has led to the use of genotype sampling strategies in order to impute and infer the presence of sequence variants that can be tested for associations with traits of interest

  • Even with exponential declines in the cost of nextgeneration genomic sequencing, there are still difficulties associated with using Whole Genome Sequencing (WGS) to conduct association studies of complex traits because the moderate to small effect sizes of variants typically involved in the etiology of such traits requires large sample sizes

  • One approach to increase the power of such studies without increasing sequencing cost is to use whole-exome sequencing (WES), in which only a small fraction of the genome is sequenced, but at high coverage [4,5]

Read more

Summary

Introduction

The reduction in the cost of sequencing a human genome has led to the use of genotype sampling strategies in order to impute and infer the presence of sequence variants that can be tested for associations with traits of interest. Low-coverage Whole Genome Sequencing (WGS) is a sampling strategy that overcomes some of the deficiencies seen in fixed content SNP array studies. As compared to whole genome sequencing (WGS), at the present time, genotype sampling is a more cost effective strategy to identify variants associated with traits of interest. Genome-wide association studies (GWAS) using fixed content marker arrays represent a genotype sampling strategy that has been successfully used in a number of studies to identify SNPs significantly associated with complex traits [1,2,3]. A second approach is to perform WGS, but reduce the overall coverage The success of this low-coverage strategy is contingent upon the ability to locate variant sites and accurately call genotypes when each site may only be covered by a small number of reads (e.g., less than 5× coverage). If variant calling in low-coverage WGS is acceptable, the increased genomic landscape sequenced relative to GWAS and WES allows for a greater chance of discovering novel variants and associations

Objectives
Methods
Results

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.