A new approach to clustering time series data using the ARIMA model uncertainty

Triyani Hendrawati ,Aji Hamim Wigena ,I Made Sumertajaya ,Bagus Sartono

doi:10.28919/cmbn/4778

Abstract

The piccolo distance is a simplified approach in clustering time-series data, and it needs the data analysts to determine the ARIMA model for each series. Some problems may arise in the modeling step because different criteria may lead to different orders of the best models. This current paper is discussing how to handle this model uncertainty problem by borrowing the concept of an ensemble in the area of data science. Instead of using a single criterion to identify the best model, we proposed to generate the best models using several different criteria. Each series was characterized by the average of the estimates of model parameters obtained. In the clustering process, we employed a hierarchical approached where the optimal number of clusters identified using the Silhouette coefficient. An extensive simulation was completed within this research, and we revealed that our proposed methodology could increase the correct cluster membership by more than 10%. We also implemented our methodology to identify clusters of areas in Indonesia (i.e., Province of Banten) based on the pattern of rainfall level and found an impressive result. Keywords: model-based clustering; uncertainty model; time series clustering; rainfall data.

Full Text