Abstract

Numerous time-course gene expression datasets have been generated for studying the biological dynamics that drive disease progression; and nearly as many methods have been proposed to analyse them. However, barely any method exists that can appropriately model time-course data while accounting for heterogeneity that entails many complex diseases. Most methods manage to fulfil either one of those qualities, but not both. The lack of appropriate methods hinders our capability of understanding the disease process and pursuing preventive treatments. We present a method that models time-course data in a personalised manner using Gaussian processes in order to identify differentially expressed genes (DEGs); and combines the DEG lists on a pathway-level using a permutation-based empirical hypothesis testing in order to overcome gene-level variability and inconsistencies prevalent to datasets from heterogenous diseases. Our method can be applied to study the time-course dynamics, as well as specific time-windows of heterogeneous diseases. We apply our personalised approach on three longitudinal type 1 diabetes (T1D) datasets, where the first two are used to determine perturbations taking place during early prognosis of the disease, as well as in time-windows before autoantibody positivity and T1D diagnosis; and the third is used to assess the generalisability of our method. By comparing to non-personalised methods, we demonstrate that our approach is biologically motivated and can reveal more insights into progression of heterogeneous diseases. With its robust capabilities of identifying disease-relevant pathways, our approach could be useful for predicting events in the progression of heterogeneous diseases and even for biomarker identification.

Highlights

  • With the increasing affordability of high-throughput technologies, such as microarray and RNA sequencing, genome-wide timecourse gene expression data has become one of the most abundant and routinely analysed type of data[1] for studying and understanding the molecular mechanisms underlying various complex diseases[2]

  • Overview of our personalised Gaussian processes (GPs) regression and pathway detection method In this paper, we present a personalised approach for identifying enriched pathways given time-course observations from multiple two-sample pairs

  • A GP regression is fit to all samples from a case-control pair together, whereas in the separate model, GP regressions are fit to cases and controls separately

Read more

Summary

INTRODUCTION

With the increasing affordability of high-throughput technologies, such as microarray and RNA sequencing, genome-wide timecourse gene expression data has become one of the most abundant and routinely analysed type of data[1] for studying and understanding the molecular mechanisms underlying various complex diseases[2]. Encapsulating a wealth of information regarding the prolonged or transient expressions of a large set of activated genes[1], time-course data helps us understand and model the (multidimensional) dynamics of complex biological systems or phenomena, such as disease progression[1,3,4]. We compared the results of the proposed personalised approach with those of a population-wide method, the original results from Kallionpää et al.[31] and a third T1D dataset from Ferreira et al.[37] This method can be applied to other heterogeneous diseases with a similar experimental design and extended to non-paired case-control datasets. Individual-specific gene-level results are summarised at pathway-level using a permutation-based empirical hypothesis testing that is tailored

RESULTS
DISCUSSION
CODE AVAILABILITY

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.