A symbolic approach to gene expression time series analysis

F.de.A.T. de Carvalho,I.G. Costa,M.C.P. de Souto

doi:10.1109/sbrn.2002.1181430

Abstract

In the analysis of gene expression time series, emphasis has been given on the capture of shape similarity (or dissimilarity). A number of proximity functions have been proposed for this task. However, none of them will suitably measure shape similarity (or dissimilarity) with data containing multiple gene expression time series, unless special data handling is made. In this paper, a symbolic description of multiple gene expression time series, where each variable takes as a value a time series, in conjunction with a version of a proximity measure, is proposed. In this symbolic approach, the shape similarity of each time series is calculated independently, and aggregated at the end. Gene expression data from five distinct time series are presented to a symbolic dynamical clustering method and self-organising map algorithm. The quality of the results obtained is evaluated using gene annotation allowing a verification of this proposal's adequacy.

Full Text