Abstract

In biological systems that undergo processes such as differentiation, a clear concept of progression exists. We present a novel computational approach, called Sample Progression Discovery (SPD), to discover patterns of biological progression underlying microarray gene expression data. SPD assumes that individual samples of a microarray dataset are related by an unknown biological process (i.e., differentiation, development, cell cycle, disease progression), and that each sample represents one unknown point along the progression of that process. SPD aims to organize the samples in a manner that reveals the underlying progression and to simultaneously identify subsets of genes that are responsible for that progression. We demonstrate the performance of SPD on a variety of microarray datasets that were generated by sampling a biological process at different points along its progression, without providing SPD any information of the underlying process. When applied to a cell cycle time series microarray dataset, SPD was not provided any prior knowledge of samples' time order or of which genes are cell-cycle regulated, yet SPD recovered the correct time order and identified many genes that have been associated with the cell cycle. When applied to B-cell differentiation data, SPD recovered the correct order of stages of normal B-cell differentiation and the linkage between preB-ALL tumor cells with their cell origin preB. When applied to mouse embryonic stem cell differentiation data, SPD uncovered a landscape of ESC differentiation into various lineages and genes that represent both generic and lineage specific processes. When applied to a prostate cancer microarray dataset, SPD identified gene modules that reflect a progression consistent with disease stages. SPD may be best viewed as a novel tool for synthesizing biological hypotheses because it provides a likely biological progression underlying a microarray dataset and, perhaps more importantly, the candidate genes that regulate that progression.

Highlights

  • Biological processes of development, differentiation and aging are increasingly being described by the temporal ordering of highly orchestrated transcriptional programs [1]

  • Assuming the underlying progression can be reflected by gradual expression changes of subsets of genes, We present a novel computational approach, Sample Progression Discovery (SPD), to discover biological progression underlying a microarray dataset

  • Microarray time series data of the cell cycle were used to evaluate the performance of SPD

Read more

Summary

Introduction

Biological processes of development, differentiation and aging are increasingly being described by the temporal ordering of highly orchestrated transcriptional programs [1]. When such processes are analyzed with gene expression microarrays at specified time points, a variety of computational methods are available to identify which genes vary and how they vary across part or all the time points [2,3,4,5,6]. SPD can be used to generate biological hypotheses about a progressive relationship among samples, and the genes that serve as key candidate regulators of the underlying process

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call