Abstract
The detection of concept drift allows to point out when a data stream changes its behaviour over time, what supports further analysis to understand why the phenomenon represented by such data has changed. Nowadays, researchers have been approaching concept drift using unsupervised learning strategies, due to data streams are open-ended sequences of data which are extremely hard to label. Those approaches usually compute divergences of consecutive models obtained over time. However, those strategies tend to be imprecise as models are obtained by clustering algorithms that do not hold any stability property. By holding a stability property, clustering algorithms would guarantee that a change in clustering models correspond to actual changes in input data. This drawback motivated this work which proposes a new approach to model data streams by using a stable hierarchical clustering algorithm. Our approach also considers a data stream composed of a mixture of time-dependent and independent observations. Experiments were conducted using synthetic data streams under different behaviors. Results confirm this new approach is capable of detecting concept drift in data streams.
Published Version
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have