Abstract

The development of high-throughput biotechnologies allows the collection of omics data to study the biological mechanisms underlying complex diseases at different levels, such as genomics, epigenomics, and transcriptomics. However, each technology is designed to collect a specific type of omics data. Thus, the association between a disease and one type of omics data is usually tested individually, but this strategy is suboptimal. To better articulate biological processes and increase the consistency of variant identification, omics data from various platforms need to be integrated. In this report, we introduce an approach that uses a modified Fisher’s method (denoted as Omnibus-Fisher) to combine separate p-values of association testing for a trait and SNPs, DNA methylation markers, and RNA sequencing, calculated by kernel machine regression into an overall gene-level p-value to account for correlation between omics data. To consider all possible disease models, we extend Omnibus-Fisher to an optimal test by using perturbations. In our simulations, a usual Fisher’s method has inflated type I error rates when directly applied to correlated omics data. In contrast, Omnibus-Fisher preserves the expected type I error rates. Moreover, Omnibus-Fisher has increased power compared to its optimal version when the true disease model involves all types of omics data. On the other hand, the optimal Omnibus-Fisher is more powerful than its regular version when only one type of data is causal. Finally, we illustrate our proposed method by analyzing whole-genome genotyping, DNA methylation data, and RNA sequencing data from a study of childhood asthma in Puerto Ricans.

Highlights

  • Because of major advances in high-throughput biotechnologies, large amounts of omics data have been collected to study the biological mechanisms underlying complex diseases at different levels, such as genomics, epigenomics, and transcriptomics

  • We introduce an approach that uses a modified Fisher’s method to combine separate p-values of association testing for a trait and SNPs, DNA methylation markers, and RNA sequencing, calculated by kernel machine regression into an overall gene-level p-value to account for correlation between omics data

  • We developed a statistical approach using a modified Fisher’s method to combine separate p-values of association testing for a trait and SNPs, DNA methylation markers, and RNA sequencing, calculated by kernel

Read more

Summary

Introduction

Because of major advances in high-throughput biotechnologies, large amounts of omics data have been collected to study the biological mechanisms underlying complex diseases at different levels, such as genomics, epigenomics, and transcriptomics. Such different types of omics data can help us understand a disease from several perspectives. The association between a complex disease and one type of omics data is usually tested individually, but this strategy is suboptimal and has some disadvantages. Integrating information from different biological datasets has the potential to yield better insight into causal mechanisms of complex diseases than that from individual omics datasets

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call