Abstract

BackgroundWith the advance of next-generation sequencing technologies, the study of rare variants in targeted genome regions or even the whole genome becomes feasible. Nevertheless, the massive amount of sequencing data brings great computational and statistical challenges for association analyses. Aside from sequencing variants, other high-throughput omic data (eg, gene expression data) also become available, and can be incorporated into association analysis for better modeling and power improvement. This motivates the need of developing computationally efficient and powerful approaches to model the joint associations of multilevel omic data with complex human diseases.MethodsA similarity-based weighted U approach is used to model the joint effect of sequencing variants and gene expression. Using a Mexican American sample provided by Genetic Analysis Workshop 19 (GAW19), we performed a whole-genome joint association analysis of sequencing variants and gene expression with systolic (SBP) and diastolic blood pressure (DBP) and hypertension (HTN) phenotypes.ResultsThe whole-genome joint association analysis was completed in 80 min on a high-performance personal computer with an i7 4700 CPU and 8 GB memory. Although no gene reached statistical significance after adjusting for multiple testing, some top-ranked genes attained a high significance level and may have biological plausibility to hypertension-related phenotypes.ConclusionsThe weighted U approach is computationally efficient for high-dimensional data analysis, and is capable of integrating multiple levels of omic data into association analysis. Through a real data application, we demonstrate the potential benefit of using the new approach for joint association analysis of sequencing variants and gene expression.

Highlights

  • With the advance of next-generation sequencing technologies, the study of rare variants in targeted genome regions or even the whole genome becomes feasible

  • Driven by the advance of sequencing technology and limited heritability explained by the genome-wide association studies (GWAS) findings [2, 3], current research focus has shifted toward studying rare variants associated with common complex diseases

  • How to efficiently analyze the highdimensional sequencing data and other omic data remains a challenge. In this empirical study, we used a similarity based weighted U approach to jointly model single nucleotide variants (SNVs) and gene expression data of 142 unrelated Mexican American samples provided by Genetic Analysis Workshop 19 (GAW19)

Read more

Summary

Introduction

With the advance of next-generation sequencing technologies, the study of rare variants in targeted genome regions or even the whole genome becomes feasible. Next-generation sequencing technology provides denser genetic profiles than previous microarray-based genotyping technology [1] It could effectively capture rare variants with low minor allele frequency (MAF). Driven by the advance of sequencing technology and limited heritability explained by the genome-wide association studies (GWAS) findings [2, 3], current research focus has shifted toward studying rare variants associated with common complex diseases. These studies hold great promise for finding new genetic variants predisposing to human disease, they face great challenges, for example, low power for detecting rare variants because of their low frequency. By grouping and testing multiple SNVs, we are able to aggregate association signals and reduce the number of tests

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call