Clustering multivariate time series by genetic multiobjective optimization

S. Bandyopadhyay,R. Baragona,U. Maulik

doi:10.1007/bf03263533

Abstract

Methods for clustering univariate time series often rely on choosing some features relevant for the problem at hand and seeking for clusters according to their measurements, for instance the autoregressive coefficients, spectral measures, time delays at some selected frequencies and special characteristics such as trend, seasonality, etc. In this context some interesting features based on indexes of goodness-of-fit seem worth of special attention. Similar approaches have been suggested for clustering sets of multivariate time series. For example, clusters of regional economies may be formed based on sets of macroeconomic time series for each country. In a multivariate framework, however, the features of interest are more difficult to extract than for univariate time series. Indeed multivariate time series may differ not only for structure or pairwise correlation but for dimensionality and internal correlation as well. We propose some measures of predictability and interpolability as indexes of goodness-of-fit for multivariate time series that may serve as useful features to find clusters in the data. The capability of a clustering methods in distinguishing clusters of multivariate time series may be evaluated by using several cluster internal validity criteria. As each criterion is known to measure some special characteristics of the extracted features, multiobjective clustering methods and a genetic algorithm implementation are used to perform such evaluation. The concept of Pareto optimality in multiobjective genetic algorithms is used to perform simultaneous search over multiple criteria. The advantage in using genetic algorithms for multiobjective optimization resides in the circumstance that genetic algorithms maintain a population of solutions most of them non-dominated in the Pareto sense so that the whole Pareto front may be provided in a single run. The effectiveness of the measures of predictability and interpolability in conjunction with the multiobjective genetic optimization procedure for outlining the cluster structure of a set of multivariate time series will be studied on a set of real time series data. Furthermore, a simulation experiment will be presented to compare the performance of the proposed procedure with procedures arising from alternative approaches.

Full Text