Abstract
MotivationDeciphering gene interaction networks (GINs) from time-course gene expression (TCGx) data is highly valuable to understand gene behaviors (e.g., activation, inhibition, time-lagged causality) at the system level. Existing methods usually use a global or local proximity measure to infer GINs from a single dataset. As the noise contained in a single data set is hardly self-resolved, the results are sometimes not reliable. Also, these proximity measurements cannot handle the co-existence of the various in vivo positive, negative and time-lagged gene interactions.Methods and resultsWe propose to infer reliable GINs from multiple TCGx datasets using a novel conserved subsequential pattern of gene expression. A subsequential pattern is a maximal subset of genes sharing positive, negative or time-lagged correlations of one expression template on their own subsets of time points. Based on these patterns, a GIN can be built from each of the datasets. It is assumed that reliable gene interactions would be detected repeatedly. We thus use conserved gene pairs from the individual GINs of the multiple TCGx datasets to construct a reliable GIN for a species. We apply our method on six TCGx datasets related to yeast cell cycle, and validate the reliable GINs using protein interaction networks, biopathways and transcription factor-gene regulations. We also compare the reliable GINs with those GINs reconstructed by a global proximity measure Pearson correlation coefficient method from single datasets. It has been demonstrated that our reliable GINs achieve much better prediction performance especially with much higher precision. The functional enrichment analysis also suggests that gene sets in a reliable GIN are more functionally significant. Our method is especially useful to decipher GINs from multiple TCGx datasets related to less studied organisms where little knowledge is available except gene expression data.
Highlights
Gene interactions are indispensable workers in complicated biological processes and molecular functions
These data are used for inferring gene interaction networks (GINs) because they are obtained by different biologists
We compare the inference performances by integrative reliable GINs and GINs produced by PCC on the three TCGxCC datasets
Summary
Gene interactions are indispensable workers in complicated biological processes and molecular functions. Single-gene targeted approaches use the expression levels of a gene as the prediction target and the expression levels of other genes (for example, transcription factors) as features, and learn the relationship of the target genes and other genes using machine learning algorithms and gene-feature selection methods such as different regression methods [3] and Random forest methods [6]. By these two kinds of approaches, gene interactions were determined using all the conditions or time points
Published Version (
Free)
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have