Time-Series Clustering Benchmark on Regional Economic Indicator

Yudhistira Dharma Putra

doi:10.33258/birci.v5i1.4374

Abstract

This paper presents a benchmark study on time-series clustering using regional economic data from the World Bank Open Data (WBOD) repository. It serves as a paradigm for future researchers. This study compares the effectiveness of twenty different techniques for time series grouping. They combine three clustering algorithms (partitional, hierarchical, and fuzzy), two centroids (K-means and K-medoids), and four distance measurements (distance between two points on a graph) (Dynamic time warping, Euclidean, shape-based distance, and global triagonal kernel alignment). The internal clustering validation index will be used to compare the performance of various techniques. Additionally, statistical tests are run on the performance of the pair of approaches to establish whether they can be compared. Across all clustering algorithms evaluated, it was discovered that utilizing K-means as centroids outperformed using K-medoids. When it comes to distance measurements, all clustering algorithms perform optimally, but the Triagonal Global Alignment Kernel is the best of these (except for the fuzzy C-means). Another conclusion reached in this study is that no solution utilizing Dynamic Time Warping and Euclidean distance measures can be compared to another (insignificant Wilcoxon test result). Simultaneously, Shape-Based Distance consistently beats all other approaches to clustering in terms of consistency.

Full Text