Estimating data stream tendencies to adapt clustering parameters

Marcelo Keese Albertini,Rodrigo Mello

doi:10.1504/ijhpcn.2016.10009050

Abstract

A wide-range of applications based on processing of data streams have emerged in the last decade. They require specialised techniques to obtain representative models and extract information. Traditional data clustering algorithms have been adapted to include continuously arriving data by updating the current model. Most of data stream clustering algorithms aggregate new data into models according to parameters usually set by users. Problems arise when choosing the values of given parameters. When the phenomenon under study is stable, an analysis of a sample of the data stream or a priori knowledge can be used. However, when the behaviour changes over collection, parameters become obsolete and, consequently, the performance is degraded. In this paper, we study the problem of how to automatically adapt control parameters of data stream clustering algorithms. In this sense, we introduce a novel approach to estimate and use data tendencies in order to automatically modify control parameters. We present a proof of the convergence of our approach towards an ideal and unknown value of the control parameter. Experimental results confirm the estimation of data tendency improves learning control parameterisation.

Full Text