Abstract

Understanding the impact of rare variants is essential to understanding human health. We analyze rare (MAF < 0.1%) variants against 4264 phenotypes in 49,960 exome-sequenced individuals from the UK Biobank and 1934 phenotypes (1821 overlapping with UK Biobank) in 21,866 members of the Healthy Nevada Project (HNP) cohort who underwent Exome + sequencing at Helix. After using our rare-variant-tailored methodology to reduce test statistic inflation, we identify 64 statistically significant gene-based associations in our meta-analysis of the two cohorts and 37 for phenotypes available in only one cohort. Singletons make significant contributions to our results, and the vast majority of the associations could not have been identified with a genotyping chip. Our results are available for interactive browsing in a webapp (https://ukb.research.helix.com). This comprehensive analysis illustrates the biological value of large, deeply phenotyped cohorts of unselected populations coupled with NGS data.

Highlights

  • Information on additional models tried, including a SKAT model and CADDbased cutoffs, can be found in the Supplementary No

  • Rare variant analyses using generation sequence data have been performed on a small number of phenotypes at a time such as in schizophrenia, developmental delay, and diabetes[33,34,35]

  • In genes where multiple rare variants contribute to the signal, we find that mapping the precise contributions of each variant in the context of the secondary and tertiary structures can reveal the most functional parts of the gene for the given phenotype and provide additional support for a statistical association (Fig. 4)

Read more

Summary

Results

The median percentage of people carrying qualifying variants in each gene was the same for both the UKB and HNP cohorts, both in European ancestry and across ethnicities: 0.13% for the coding model, and 0.02% for the LoF model. To reduce test statistic inflation for binary traits, genes were only included in the LMM analysis if the expected number of variant carriers in the case group was at least ten, based on the overall carrier and phenotype frequency[21]. This is an essential step to avoid false positive associations in gene-based collapsing analysis results, especially when there is a case-control imbalance (Fig. 3). We found that each of the associations that were statistically significant in the mixed

Method
Discussion
Methods
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.