Abstract

Until date, several machine learning approaches have been proposed for the dynamic modeling of temporal omics data. Although they have yielded impressive results in terms of model accuracy and predictive ability, most of these applications are based on "Black-box" algorithms and more interpretable models have been claimed by the research community. The recent eXplainable Artificial Intelligence (XAI) revolution offers a solution for this issue, were rule-based approaches are highly suitable for explanatory purposes. The further integration of the data mining process along with functional-annotation and pathway analyses is an additional way towards more explanatory and biologically soundness models. In this paper, we present a novel rule-based XAI strategy (including pre-processing, knowledge-extraction and functional validation) for finding biologically relevant sequential patterns from longitudinal human gene expression data (GED). To illustrate the performance of our pipeline, we work on in vivo temporal GED collected within the course of a long-term dietary intervention in 57 subjects with obesity (GSE77962). As validation populations, we employ three independent datasets following the same experimental design. As a result, we validate primarily extracted gene patterns and prove the goodness of our strategy for the mining of biologically relevant gene-gene temporal relations. Our whole pipeline has been gathered under open-source software and could be easily extended to other human temporal GED applications.

Highlights

  • Biological processes in humans are not single-gene based mechanisms, but complex systems controlled by regulatory interactions between thousands of genes

  • Main topics covered in this paper include: 1) Preliminary concepts in ARM. 2) Methodological description of the proposed pipeline, 3) Description of the research problem and employed datasets, 4) Results description, where we evaluate the performance of our pipeline in terms of the insights extracted from a discovery sample and their validation in independent cohorts and 5) Discussion section, where we deepen the goodness of our proposal and list some drawbacks and challenges to be faced in future applications

  • With the aim of illustrating the performance of our method on human long-term intervention data, we accessed and downloaded a discovery dataset composed of 57 subjects with obesity participating in a long-term dietary program [47,48]

Read more

Summary

Introduction

Biological processes in humans are not single-gene based mechanisms, but complex systems controlled by regulatory interactions between thousands of genes. Within these gene regulatory networks, time-delay is a common phenomenon and genes interact each other within a four-dimension space [1]. Time-delayed gene regulation is especially present in long-term interventions, in which changes in gene expression reflect the response of genes to external factors and may cause subsequent changes on the expression of other genes. In most of the genome scans performed to date, the effects of each gene on the trait of interest have been interrogated one at a time; presenting a limited throughput to get the overall picture of gene networks and their temporal relations. There is not a clear picture of the dynamic trends in gene-gene interactions and much of the heritability of complex human traits remains unexplained, a phenomenon termed as the “missing heritability” problem [2]

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call