Abstract

With the rapidly decreasing cost of the next-generation sequencing technology, a large number of whole genome sequences have been generated, enabling researchers to survey rare variants in the protein-coding and regulatory regions of the genome. However, it remains a daunting task to identify functional variants associated with complex diseases from whole genome sequencing (WGS) data because of the millions of candidate variants and yet moderate sample size. We propose to incorporate the Encyclopedia of DNA Elements (ENCODE) information in the association analysis of WGS data to boost the statistical power. We use the RegulomeDB and PolyPhen2 scores as external weights in existing rare variants association tests. We demonstrate the proposed framework using the WGS data and blood pressure phenotype from the San Antonio Family Studies provided by the Genetic Analysis Workshop 19. We identified a genome-wide significant locus in gene SNUPN on chromosome 15 that harbors a rare nonsynonymous variant, which was not detected by benchmark methods that did not incorporate biological information, including the T5 burden test and sequence kernel association test.

Highlights

  • Genome-wide association studies (GWAS) have identified thousands of genetic loci robustly associated with a wide range of complex diseases and traits

  • We observed that the genome-wide significant sliding windows centering at chr15:75912109 and chr15:75912182 included some variants in category 1, suggesting that the external biological information might have helped boost the signals

  • It turned out that exonic variant chr15:75913349 was not annotated in RegulomeDB, but was annotated as a probably damaging nonsynonymous variant by PolyPhen2 with a confidence score of 99.2 %, resulting in category 1 in our weighting scheme

Read more

Summary

Introduction

Genome-wide association studies (GWAS) have identified thousands of genetic loci robustly associated with a wide range of complex diseases and traits. There is a big gap between the disease heritability explained by GWAS-identified loci and that estimated from twin/family-based studies, leading to the so-called missing heritability [1] To fill in this gap, recent genetic studies have shifted gear from GWAS investigating common single-nucleotide polymorphisms with a minor allele frequency (MAF) larger than 5 % to low frequency (MAF between 1 and 5 %) and rare variants (RVs with MAFs

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call