Abstract

Although DNA array-based approaches for genome-wide association studies (GWAS) permit the collection of thousands of low-cost genotypes, it is often at the expense of resolution and completeness, as SNP chip technologies are ultimately limited by SNPs chosen during array development. An alternative low-cost approach is low-pass whole genome sequencing (WGS) followed by imputation. Rather than relying on high levels of genotype confidence at a set of select loci, low-pass WGS and imputation rely on the combined information from millions of randomly sampled low-confidence genotypes. To investigate low-pass WGS and imputation in the dog, we assessed accuracy and performance by downsampling 97 high-coverage (> 15×) WGS datasets from 51 different breeds to approximately 1× coverage, simulating low-pass WGS. Using a reference panel of 676 dogs from 91 breeds, genotypes were imputed from the downsampled data and compared to a truth set of genotypes generated from high-coverage WGS. Using our truth set, we optimized a variant quality filtering strategy that retained approximately 80% of 14 M imputed sites and lowered the imputation error rate from 3.0% to 1.5%. Seven million sites remained with a MAF > 5% and an average imputation quality score of 0.95. Finally, we simulated the impact of imputation errors on outcomes for case–control GWAS, where small effect sizes were most impacted and medium-to-large effect sizes were minorly impacted. These analyses provide best practice guidelines for study design and data post-processing of low-pass WGS-imputed genotypes in dogs.

Highlights

  • The price per marker for a genotyping assay can have a large influence on the success of genetic association studies

  • DNA genotyping arrays are limited by various known and unknown biases that occur during marker selection and probe design that cannot be removed without redesigning a new DNA array, which is an expensive and timeconsuming process

  • Rather than assigning genotypes based on high confidence calls across a finite set of loci, low-pass whole genome sequencing (WGS) combines information from millions of randomly sampled low-confidence variant calls to impute likely genotypes from a reference panel, comprised of a large collection of WGS datasets representing potential haplotypes found within a population

Read more

Summary

Introduction

The price per marker for a genotyping assay can have a large influence on the success of genetic association studies. DNA genotyping arrays are limited by various known and unknown biases that occur during marker selection and probe design that cannot be removed without redesigning a new DNA array, which is an expensive and timeconsuming process. An alternative priced approach is low-pass whole genome sequencing (WGS) and imputation (Martin et al 2021). Rather than assigning genotypes based on high confidence calls across a finite set of loci, low-pass WGS combines information from millions of randomly sampled low-confidence variant calls to impute likely genotypes from a reference panel, comprised of a large collection of WGS datasets representing potential haplotypes found within a population. Since low-pass WGS isn’t biased toward sampling specific loci, a major limiting factor is the reference panel used. The utility of previous datasets can only improve with updated reference panels and is not hampered by acquisition bias of predetermined sites

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call