Implementation of time series data clustering based on SVD for stock data analysis on hadoop platform

Yonghong Xie,Aziguli Wulamu,Yantao Wang,Zheng Liu

doi:10.1109/iciea.2014.6931498

Abstract

With a growing amount of data, a viable solution is to use a cluster consisting of a large of computers for parallel processing, and Hadoop parallel computing platform is a typical representative. Clustering analysis for time series data is one of the main methods mining time series data, however, general clustering algorithms can't perform clustering for time series data directly since series data has a special structure. The time series clustering algorithm presented is a combining algorithm from algorithms of Canopy and K-means based on SVD. Using singular value decomposition for feature extraction from the time series data, and then use Canopy and K-means algorithms to clustering analysis the feature data of the time series, at last, the algorithm is implemented on Hadoop platform by Mahout leading to a new clustering method that can handle massive time series data. Finally, this new clustering analysis method is successfully applied to real stock time series data with a satisfactory result.

Full Text