Abstract

Data mining has become a key ingredient in establishing intelligent decision support systems. As one of main branches in data mining, data stream clustering has received much attention over the past decade. Most existing data stream clustering techniques count on Euclidean distance metric for finding similar objects and hence produce spherical clusters which are not always suitable to represent the data. Moreover, in most of the real world problems, we come across the data of varying density which cannot be handled by density-based clustering techniques. In this paper, we introduce a new clustering technique called Hyper-Ellipsoidal Clustering for Evolving data Stream (HECES) based on the recently proposed HyCARCE algorithm. In HECES, a few modifications in the HyCARCE algorithm are made for handling stream clustering problem: sliding window model is used to handle incoming stream of data to minimize the impact of the obsolete information on recent clustering results; shrinkage technique is used to avoid the singularity issue in finding the covariance of correlated data; a novel technique for merging the initial ellipsoids is used to obtain the final clusters instead of a computationally intensive process of expansion and adjustment. HECES relies on Mahalanobis distance metric to cluster the data points and hence results in ellipsoidal shaped clusters. It can successfully handle data of varying density. Experiments on various synthetic and real datasets for clustering streaming data provide a comparative validation of our approach.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call