Abstract

(1) Background: Complex genetic relationships, including gene-gene (G × G; epistasis), gene(n), and gene-environment (G × E) interactions, explain a substantial portion of the heritability in multiple sclerosis (MS). Machine learning and data mining methods are promising approaches for uncovering higher order genetic relationships, but their use in MS have been limited. (2) Methods: Association rule mining (ARM), a combinatorial rule-based machine learning algorithm, was applied to genetic data for non-Latinx MS cases (n = 207) and controls (n = 179). The objective was to identify patterns (rules) amongst the known MS risk variants, including HLA-DRB1*15:01 presence, HLA-A*02:01 absence, and 194 of the 200 common autosomal variants. Probabilistic measures (confidence and support) were used to mine rules. (3) Results: 114 rules met minimum requirements of 80% confidence and 5% support. The top ranking rule by confidence consisted of HLA-DRB1*15:01, SLC30A7-rs56678847 and AC093277.1-rs6880809; carriers of these variants had a significantly greater risk for MS (odds ratio = 20.2, 95% CI: 8.5, 37.5; p = 4 × 10−9). Several variants were shared across rules, the most common was INTS8-rs78727559, which was in 32.5% of rules. (4) Conclusions: In summary, we demonstrate evidence that specific combinations of MS risk variants disproportionately confer elevated risk by applying a robust analytical framework to a modestly sized study population.

Highlights

  • (4) Conclusions: In summary, we demonstrate evidence that specific combinations of Multiple sclerosis (MS) risk variants disproportionately confer elevated risk by applying a robust analytical framework to a modestly sized study population

  • Association rule mining (ARM) is a rule-based machine learning method that relies on the a priori algorithm for efficient mining of association rules within large datasets [22,23,24]

  • It was originally developed for market basket analyses of patterns in retail transactions, but it has been applied to diverse relational datasets, including applications for discerning multimorbidity patterns in administrative claims data and characterizing complex genetic relationships in simulated data [25,26]

Read more

Summary

Introduction

In non-Latinx whites, the heritability of MS is estimated to be 50% (95% confidence interval [CI]: 39–61%) [1]. Genetic variants explain 44.8% of the heritability for MS (h2 = 22.4%) [2,3]; complex genetic (gene-gene [G × G], gene(n)), gene-environment (G × E), and geneepigenome interactions, as well as intergenerational epigenetic inheritance, explain the majority of MS’ heritability (>55%) [4]. The principal impediments to elucidating these complex relationships is a paucity of comprehensive epidemiologic and multi-omic MS datasets, and the methodological and statistical challenges of detecting higher order relationships in big data [12,13]

Methods
Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call