A validated generally applicable approach using the systematic assessment of disease modules by GWAS reveals a multi-omic module strongly associated with risk factors in multiple sclerosis

Tejaswi V S Badam,Hendrik A De Weerd,Ingrid Kockum,Maja Jagodic,David Martínez-Enguita,Mika Gustafsson,Zelmina Lubovac-Pilav,Tomas Olsson,Lars Alfredsson

doi:10.1186/s12864-021-07935-1

Abstract

BackgroundThere exist few, if any, practical guidelines for predictive and falsifiable multi-omic data integration that systematically integrate existing knowledge. Disease modules are popular concepts for interpreting genome-wide studies in medicine but have so far not been systematically evaluated and may lead to corroborating multi-omic modules.ResultWe assessed eight module identification methods in 57 previously published expression and methylation studies of 19 diseases using GWAS enrichment analysis. Next, we applied the same strategy for multi-omic integration of 20 datasets of multiple sclerosis (MS), and further validated the resulting module using both GWAS and risk-factor-associated genes from several independent cohorts. Our benchmark of modules showed that in immune-associated diseases modules inferred from clique-based methods were the most enriched for GWAS genes. The multi-omic case study using MS data revealed the robust identification of a module of 220 genes. Strikingly, most genes of the module were differentially methylated upon the action of one or several environmental risk factors in MS (n = 217, P = 10− 47) and were also independently validated for association with five different risk factors of MS, which further stressed the high genetic and epigenetic relevance of the module for MS.ConclusionsWe believe our analysis provides a workflow for selecting modules and our benchmark study may help further improvement of disease module methods. Moreover, we also stress that our methodology is generally applicable for combining and assessing the performance of multi-omic approaches for complex diseases.

Highlights

There exist few, if any, practical guidelines for predictive and falsifiable multi-omic data integration that systematically integrate existing knowledge
We believe our analysis provides a workflow for selecting modules and our benchmark study may help further improvement of disease module methods
A benchmark comparing 337 transcriptionally derived disease modules from 19 different diseases We compiled a benchmark source of disease modules and summary statistics of genome-wide association (GWAS) datasets from 19 wellpowered case-control studies (Supplementary Table 1), some of which were previously used in the DREAM topological disease module challenge [6]

Summary

Introduction

There exist few, if any, practical guidelines for predictive and falsifiable multi-omic data integration that systematically integrate existing knowledge. Genes that are associated with diseases are more likely to interact with each other rather than with non-disease associated genes, forming multi-omic network disease modules [3, 4]. Owing to the incompleteness of the underlying multi-omic interactions, the networks are often modeled as effective gene-gene interactions, using for example STRING database [5]. Network modules might be ideal tools for multi-omic analysis. The evaluation of performance of different module inference methods remains a poorly understood topic, which creates the need for transparent evaluation of these methods based on objective benchmarks across various diseases and omics. Genomic concordance has been suggested as a multi-omic validation principle [4, 6], i.e., modules derived from one omic, such as gene expression or DNA methylation should be enriched for diseaseassociated single nucleotide polymorphisms (SNPs)

Methods

Results

Discussion

Conclusion