Time series data mining

Λεωνίδας Καραμητόπουλος

doi:10.12681/eadd/18348

Abstract

In this dissertation, we investigate various techniques for efficiently applying Time Series Data Mining methods in very large databases. The main tasks of these methods are: clustering, classification, novelty detection, motif discovery and rule discovery. At the core of these tasks lies the concept of similarity, since most of them require searching for similar patterns. The temporal nature of data arises two special issues to be considered in the process of similarity search. The first one is the definition of an appropriate similarity measure that allows imprecise matches among time series. The second issue is the representation of time series in order to reduce the intrinsically high dimensionality present in this type of data. Our research focuses on univariate, as well as, on multivariate time series. In the first case, similarity is sought among one-dimensional time series, whereas in the latter case, similarity is sought among objects, which consist of a set of time series. There are five major contributions of this work. First, we propose a Time Series Data Mining approach in the task of control chart pattern recognition. We demonstrate the capability of Time Series Data Mining techniques in handling tasks that traditionally are approached by application-specific methods. Second, we present a novel representation for dimensionality reduction along with an appropriate measure in order to improve the quality of similarity search while retaining the required efficiency. Third, we propose a new technique that aims at accelerating one-nearest neighbor similarity search. This technique involves the application of a representation on the original time series and, subsequently, the partition of the search space into a number of clusters. Fourth, we present a novel approach in multivariate time series similarity search that includes a representation based on Principal Components Analysis and a new technique of measuring similarity among multivariate objects. Fifth, we provide an extensive literature review of multivariate time series data mining. All the proposed methods in this dissertation have been experimentally evaluated on the quality of similarity search with respect to a wide range of real-world and synthetic datasets.

Full Text