Abstract

Clustering data stream is an active research area that has recently emerged to discover knowledge from large amounts of continuously generated data. Several data stream clustering algorithms have been proposed to perform unsupervised learning. Nevertheless, data stream clustering imposes several challenges to be addressed, such as dealing with dynamic data that arrive in an online fashion, capable of performing fast and incremental processing of data objects, and suitably addressing time and memory limitations. In this paper, we propose a semi-supervised clustering algorithm that extends Affinity Propagation (AP) to handle evolving data steam. We incorporate a set of labeled data items with set of exemplars to detect a change in the generative process underlying the data stream, which requires the stream model to be updated as soon as possible. Experimental results with state-of-the-art data stream clustering methods demonstrate the effectiveness and efficiency of the proposed method.KeywordsAffinity propagationdata streamssemi-supervised clustering

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call