A hybrid algorithm for clustering of time series data based on affinity search technique.

Saeed Aghabozorgi,Hamid A Jalab,Tutut Herawan,Mohammad Amin Shaygan,Teh Ying Wah,Alireza Jalali

doi:10.1155/2014/562194

Saeed Aghabozorgi, Hamid A Jalab + Show 4 more

Open Access

https://doi.org/10.1155/2014/562194

Copy DOI

Journal: TheScientificWorldJournal	Publication Date: Jan 1, 2014
Citations: 102	License type: CC BY 3.0

Affiliation: University of Malaya

Abstract

Time series clustering is an important solution to various problems in numerous fields of research, including business, medical science, and finance. However, conventional clustering algorithms are not practical for time series data because they are essentially designed for static data. This impracticality results in poor clustering accuracy in several systems. In this paper, a new hybrid clustering algorithm is proposed based on the similarity in shape of time series data. Time series data are first grouped as subclusters based on similarity in time. The subclusters are then merged using the k-Medoids algorithm based on similarity in shape. This model has two contributions: (1) it is more accurate than other conventional and hybrid approaches and (2) it determines the similarity in shape among time series data with a low complexity. To evaluate the accuracy of the proposed model, the model is tested extensively using syntactic and real-world time series datasets.

Highlights

Clustering is considered the most important unsupervised learning problem
The experiment on the proposed model is conducted with one syntactic dataset and 12 real-word datasets obtained from the UCR Time Series Data Mining Archive in various domains and sizes [64]
We illustrated the advantages of using some time series data as prototypes to cluster time series data based on the similarity in shape

Summary

Introduction

Clustering is considered the most important unsupervised learning problem. The clustering of time series data is advantageous in exploratory data analysis and summary generation. Conventional approaches employed in the clustering of time series data are typically partitioning, hierarchical, or model-based algorithms. Aside from all of these conventional approaches, some new articles emphasize the enhancement of algorithms and present customized models (typically as a hybrid method) for time series data clustering. To generate the time series network, the authors propose a triangle distance measurement to calculate the similarity between time series data. To evaluate the accuracy of the proposed model, TTC is tested extensively using published time series datasets from diverse domains This model is shown to be more accurate than any of the existing works and overcomes the limitations of conventional clustering algorithms in determining the clusters of time series data that are similar in shape.

Concepts and Definitions

The Proposed Algorithm

Step 1

Step 2

Analysis

Conclusion and Future Works