Abstract

BackgroundTime-course microarray experiments produce vector gene expression profiles across a series of time points. Clustering genes based on these profiles is important in discovering functional related and co-regulated genes. Early developed clustering algorithms do not take advantage of the ordering in a time-course study, explicit use of which should allow more sensitive detection of genes that display a consistent pattern over time. Peddada et al. [1] proposed a clustering algorithm that can incorporate the temporal ordering using order-restricted statistical inference. This algorithm is, however, very time-consuming and hence inapplicable to most microarray experiments that contain a large number of genes. Its computational burden also imposes difficulty to assess the clustering reliability, which is a very important measure when clustering noisy microarray data.ResultsWe propose a computationally efficient information criterion-based clustering algorithm, called ORICC, that also takes account of the ordering in time-course microarray experiments by embedding the order-restricted inference into a model selection framework. Genes are assigned to the profile which they best match determined by a newly proposed information criterion for order-restricted inference. In addition, we also developed a bootstrap procedure to assess ORICC's clustering reliability for every gene. Simulation studies show that the ORICC method is robust, always gives better clustering accuracy than Peddada's method and saves hundreds of times computational time. Under some scenarios, its accuracy is also better than some other existing clustering methods for short time-course microarray data, such as STEM [2] and Wang et al. [3]. It is also computationally much faster than Wang et al. [3].ConclusionOur ORICC algorithm, which takes advantage of the temporal ordering in time-course microarray experiments, provides good clustering accuracy and is meanwhile much faster than Peddada's method. Moreover, the clustering reliability for each gene can also be assessed, which is unavailable in Peddada's method. In a real data example, the ORICC algorithm identifies new and interesting genes that previous analyses failed to reveal.

Highlights

  • Time-course microarray experiments produce vector gene expression profiles across a series of time points

  • From a different perspective to the orderrestricted inference, we propose a new order-restricted information criterion-based clustering (ORICC) algorithm, which is computationally much more efficient than Peddada's method

  • Simulation 3 In the third simulation, we examine the robustness of the ORICC algorithm

Read more

Summary

Introduction

Time-course microarray experiments produce vector gene expression profiles across a series of time points. Peddada et al [1] proposed a clustering algorithm that can incorporate the temporal ordering using order-restricted statistical inference This algorithm is, very time-consuming and inapplicable to most microarray experiments that contain a large number of genes. Most of them view observed temporal gene expression profiles coming from underlying smooth curves and cluster genes based on estimated expression profiles obtained from nonparametric smoothing [2,13,14,15,16,17,18,19,20,21,22,23,24,25] While these algorithms work well for relatively long time series data, they are not appropriate for short time-course microarray data often taken on a small number of sparse time points. This profile matching clustering strategy is different from most unsupervised clustering where a representation of a cluster is often calculated only after the cluster is formed

Methods
Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call