Abstract

Genome-wide association studies (GWAS) have identified thousands of genetic variants linked to the risk of human disease. However, GWAS have so far remained largely underpowered in relation to identifying associations in the rare and low-frequency allelic spectrum and have lacked the resolution to trace causal mechanisms to underlying genes1. Here we combined whole-exome sequencing in 392,814 UK Biobank participants with imputed genotypes from 260,405 FinnGen participants (653,219 total individuals) to conduct association meta-analyses for 744 disease endpoints across the protein-coding allelic frequency spectrum, bridging the gap between common and rare variant studies. We identified 975 associations, with more than one-third being previously unreported. We demonstrate population-level relevance for mutations previously ascribed to causing single-gene disorders, map GWAS associations to likely causal genes, explain disease mechanisms, and systematically relate disease associations to levels of 117 biomarkers and clinical-stage drug targets. Combining sequencing and genotyping in two population biobanks enabled us to benefit from increased power to detect and explain disease associations, validate findings through replication and propose medical actionability for rare genetic variants. Our study provides a compendium of protein-coding variant associations for future insights into disease biology and drug discovery.

Highlights

  • In recent years, population biobanks have been added to the toolkit for disease gene discovery

  • We performed coding-wide association studies (CWAS) across 744 disease endpoints over a mean of 48,189 (Methods, Supplementary Table 2) post-quality control coding variants across the allele frequency spectrum derived from whole-exome sequencing of 392,814 European ancestry individuals in UK Biobank (UKB) and meta-analysed these data with summary results from up to 260,405 individuals in FG (Methods, Supplementary Table 2)

  • Our study revealed that a low-frequency missense variant in the methylase METTL11B (rs41272485:G; Ile127Met; Minor allele frequencies (MAFs) = 3.9% (UKB) and 3.8% (FG)) was associated with increased Atrial fibrillation (AF) risk (log(OR) = 0.14, P = 4.0 × 10−11)

Read more

Summary

Discussion

We have conducted the largest association study of protein-coding genetic variants so far against hundreds of disease endpoints ascertained from two massive population biobanks, UK Biobank and FinnGen. Of the 975 associations identified in our study, 145 are driven by unique variants in the so far little-interrogated rare and low-allelic frequency spectrum between 0.1 and 2% that neither GWAS nor sequencing studies have been able to thoroughly interrogate across a range of diseases and that is hypothesized to contribute to the ‘missing heritability’ of many human diseases[34]. We identify the most prominent relative power gain in the rarest variant frequency spectrum, highlighting a role for sequencing studies and integrating additional population cohorts with enriched variants for identifying novel disease associations at scale. D. et al Advancing human genetics research and drug discovery through exome sequencing of the UK Biobank. 5. Wang, Q. et al Rare variant contribution to human disease in 281,104 UK Biobank exomes. J. et al Systematic single-variant and gene-based association testing of 3,700 phenotypes in 281,850 UK Biobank exomes. R. et al The support of human genetic evidence for approved drug indications. A full list of members and their affiliations appears in the Supplementary Information

Methods
Findings
Code availability
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call