Abstract

Time series clustering is an important solution to various problems in numerous fields of research, including business, medical science, and finance. However, conventional clustering algorithms are not practical for time series data because they are essentially designed for static data. This impracticality results in poor clustering accuracy in several systems. In this paper, a new hybrid clustering algorithm is proposed based on the similarity in shape of time series data. Time series data are first grouped as subclusters based on similarity in time. The subclusters are then merged using the k-Medoids algorithm based on similarity in shape. This model has two contributions: (1) it is more accurate than other conventional and hybrid approaches and (2) it determines the similarity in shape among time series data with a low complexity. To evaluate the accuracy of the proposed model, the model is tested extensively using syntactic and real-world time series datasets.

Highlights

  • Clustering is considered the most important unsupervised learning problem

  • The experiment on the proposed model is conducted with one syntactic dataset and 12 real-word datasets obtained from the UCR Time Series Data Mining Archive in various domains and sizes [64]

  • We illustrated the advantages of using some time series data as prototypes to cluster time series data based on the similarity in shape

Read more

Summary

Introduction

Clustering is considered the most important unsupervised learning problem. The clustering of time series data is advantageous in exploratory data analysis and summary generation. Conventional approaches employed in the clustering of time series data are typically partitioning, hierarchical, or model-based algorithms. Aside from all of these conventional approaches, some new articles emphasize the enhancement of algorithms and present customized models (typically as a hybrid method) for time series data clustering. To generate the time series network, the authors propose a triangle distance measurement to calculate the similarity between time series data. To evaluate the accuracy of the proposed model, TTC is tested extensively using published time series datasets from diverse domains This model is shown to be more accurate than any of the existing works and overcomes the limitations of conventional clustering algorithms in determining the clusters of time series data that are similar in shape.

Concepts and Definitions
The Proposed Algorithm
Step 1
Step 2
Analysis
Conclusion and Future Works
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call