A Fuzzy Approach For Clustering Gene Expression Time Series Data

Sadiq Hussain,G.C Hazarika

doi:10.5121/ijcsit.2011.3415

Abstract

Identifying groups of genes that manifest similar expression patterns is crucial in the analysis of gene expression time series data. Choosing a similarity measure to determine the similarity or distance between profiles is an important task. Time series expression experiments are used to study a wide range of biological systems. More than 80% of all time series expression datasets are short (8 time points or fewer). These datasets present unique challenges. On account of the large number of genes profiled (often tens of thousands) and the small number of time points many patterns are expected to arise at random. Most clustering algorithms are unable to distinguish between real and random patterns. However, the shortness of gene expression time-series data limits the use of conventional statistical models and techniques for time-series analysis. To address this problem, this paper proposes the Fuzzy clustering algorithm based on short time-series, which is able to cluster profiles based on the similarity of their relative change of expression level and the corresponding temporal in- formation. One of the major advantages of fuzzy clustering is that genes can belong to more than one group, revealing distinctive features of each gene's function and regulation.

Full Text