Abstract

BackgroundInherited susceptibility to common, complex diseases may be caused by rare, pathogenic variants (“monogenic”) or by the cumulative effect of numerous common variants (“polygenic”). Comprehensive genome interpretation should enable assessment for both monogenic and polygenic components of inherited risk. The traditional approach requires two distinct genetic testing technologies—high coverage sequencing of known genes to detect monogenic variants and a genome-wide genotyping array followed by imputation to calculate genome-wide polygenic scores (GPSs). We assessed the feasibility and accuracy of using low coverage whole genome sequencing (lcWGS) as an alternative to genotyping arrays to calculate GPSs.MethodsFirst, we performed downsampling and imputation of WGS data from ten individuals to assess concordance with known genotypes. Second, we assessed the correlation between GPSs for 3 common diseases—coronary artery disease (CAD), breast cancer (BC), and atrial fibrillation (AF)—calculated using lcWGS and genotyping array in 184 samples. Third, we assessed concordance of lcWGS-based genotype calls and GPS calculation in 120 individuals with known genotypes, selected to reflect diverse ancestral backgrounds. Fourth, we assessed the relationship between GPSs calculated using lcWGS and disease phenotypes in a cohort of 11,502 individuals of European ancestry.ResultsWe found imputation accuracy r2 values of greater than 0.90 for all ten samples—including those of African and Ashkenazi Jewish ancestry—with lcWGS data at 0.5×. GPSs calculated using lcWGS and genotyping array followed by imputation in 184 individuals were highly correlated for each of the 3 common diseases (r2 = 0.93–0.97) with similar score distributions. Using lcWGS data from 120 individuals of diverse ancestral backgrounds, we found similar results with respect to imputation accuracy and GPS correlations. Finally, we calculated GPSs for CAD, BC, and AF using lcWGS in 11,502 individuals of European ancestry, confirming odds ratios per standard deviation increment ranging 1.28 to 1.59, consistent with previous studies.ConclusionslcWGS is an alternative technology to genotyping arrays for common genetic variant assessment and GPS calculation. lcWGS provides comparable imputation accuracy while also overcoming the ascertainment bias inherent to variant selection in genotyping array design.

Highlights

  • Inherited susceptibility to common, complex diseases may be caused by rare, pathogenic variants (“monogenic”) or by the cumulative effect of numerous common variants (“polygenic”)

  • While imputation accuracy was similar across diverse populations, it was slightly reduced in the Colombian sample (HG01485), likely due to complex local ancestry related to admixture, and in the Yoruban sample (NA19240), likely due to the shorter blocks of linkage disequilibrium and higher genetic diversity in Africa [8]

  • We found that GPSCAD, GPSBC, and GPSAF were highly correlated (Fig. 3a–c), with the score mean (Student’s t test p = 0.17) and variance (F test p = 0.91) equivalent between values are unitless. lcWGS low coverage whole genome sequencing; Genome-wide polygenic score (GPS), genome-wide polygenic score; CAD, coronary artery disease; BC, breast cancer; AF, atrial fibrillation lcWGS and the genotyping array

Read more

Summary

Introduction

Complex diseases may be caused by rare, pathogenic variants (“monogenic”) or by the cumulative effect of numerous common variants (“polygenic”). Long recognized to be heritable, recent advances in human genetics have led to consideration of DNA-based risk stratification to guide prevention or screening strategies In some cases, such conditions can be caused by rare, “monogenic” pathogenic variants that lead to a several-fold increased risk—important examples are pathogenic variants in LDLR that cause familial hypercholesterolemia and pathogenic variants in BRCA1 and BRCA2 that underlie hereditary breast and ovarian cancer syndrome. Genome-wide polygenic scores (GPSs) provide a way to integrate information from numerous sites of common variation into a single metric of inherited susceptibility and are able to identify individuals with a several-fold increased risk of common, complex diseases, including coronary artery disease (CAD), breast cancer (BC), and atrial fibrillation (AF) [3]. For CAD, we previously noted that 8% of the population inherits more than triple the normal risk on the basis of polygenic variation, a prevalence more than 20-fold higher than monogenic familial hypercholesterolemia variants in LDLR that confer similar risk [3]

Methods
Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call