A TIME SERIES KNOWLEDGE MINING FRAMEWORK EXPLOITING THE SYNERGY BETWEEN SUBSEQUENCE CLUSTERING AND PREDICTIVE MARKOVIAN MODELS

Vasile Georgescu

doi:10.25102/fer.2009.01.03

Abstract

This paper proposes a time series knowledge mining framework, designed to favor the synergy between subsequence time series clustering and predictive tools such as Hidden Markov Models. Many tasks for temporal data mining rely heavily on the choice of the representation scheme and the dissimilarity measure. The first part is concerned with detailed representation taxonomy for numeric and symbolic time series and comprehensive categorization of distance measures. Subsequence time series clustering methods with a sliding window are addressed in the second part and a generalization of Fuzzy C-Means algorithm based on the dynamic time warping distance is proposed as a very effective solution. This involves a shape-based distance tolerant to phase shifts in time or accelerations/decelerations along the time axis. It also allows to determine the degree to which set-defined objects, such as subsequence time series and their cluster centroids (similar in nature) differ from each other. In the third part we discuss the integration of clustering algorithms with probabilistic predictive tools, such as discrete Markov chains or hidden Markov models. We apply these techniques to clustering of non-overlapping sequences extracted from Standard and Poor’s 500 stock index historical data and we suggest different integrations with markovian models to improve the predictive power

Full Text