Abstract

Many research teams perform numerous genetic, transcriptomic, proteomic and other types of omic experiments to understand molecular, cellular and physiological mechanisms of disease and health. Often (but not always), the results of these experiments are deposited in publicly available repository databases. These data records often include phenotypic characteristics following genetic and environmental perturbations, with the aim of discovering underlying molecular mechanisms leading to the phenotypic responses. A constrained set of phenotypic characteristics is usually recorded and these are mostly hypothesis driven of possible to record within financial or practical constraints. We present a novel proof-of-principal computational approach for combining publicly available gene-expression data from control/mutant animal experiments that exhibit a particular phenotype, and we use this approach to predict unobserved phenotypic characteristics in new experiments (data derived from EBI's ArrayExpress and ExpressionAtlas respectively). We utilised available microarray gene-expression data for two phenotypes (starvation-sensitive and sterile) in Drosophila. The data were combined using a linear-mixed effects model with the inclusion of consecutive principal components to account for variability between experiments in conjunction with Gene Ontology enrichment analysis. We present how available data can be ranked in accordance to a phenotypic likelihood of exhibiting these two phenotypes using random forest. The results from our study show that it is possible to integrate seemingly different gene-expression microarray data and predict a potential phenotypic manifestation with a relatively high degree of confidence (>80% AUC). This provides thus far unexplored opportunities for inferring unknown and unbiased phenotypic characteristics from already performed experiments, in order to identify studies for future analyses. Molecular mechanisms associated with gene and environment perturbations are intrinsically linked and give rise to a variety of phenotypic manifestations. Therefore, unravelling the phenotypic spectrum can help to gain insights into disease mechanisms associated with gene and environmental perturbations. Our approach uses public data that are set to increase in volume, thus providing value for money.

Highlights

  • Despite the flood of molecular omics data, with a few notable exceptions, such as the Genotype-Tissue Expression (GTEx) project [1], most datasets are rarely re-used, mainly due to challenges with combining the data from different sources

  • In this paper we present a novel computational approach for integrating gene-expression data for two specific phenotypes in Drosophila from the vast and largely unutilised freely available public repositories

  • This integration is multi-layered with phenotypic information derived from a species-specific database (FlyBase) and gene-expression from the largest repository of publicly available genomic data, the ExpressionAtlas at the European Bioinformatics Institute

Read more

Summary

Introduction

Despite the flood of molecular omics data, with a few notable exceptions, such as the Genotype-Tissue Expression (GTEx) project [1], most datasets are rarely re-used, mainly due to challenges with combining the data from different sources. In most experimental studies, additional measures are made of biochemical, and physiological changes and of changes in the phenotypic characteristics that they bring about. Phenotypes can include, for instance, morphology, behaviour and pathology. A limited number of phenotypes are recorded, due to various study constraints. Sub-phenotype, is one that underlies the study phenotype, but crucially is influenced by fewer genes [2]. Sub-phenotypes of Parkinson’s Disease (PD) can include olfactory impairment, gut function disturbance, motor impairments and cognitive decline, each of which may be mediated by subsets of the genes that together result in PD pathology. Quantifying a wide variety of sub-phenotypes associated with animal models of a disease could help to identify causal mechanisms

Objectives
Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call