Abstract

A data stream is a massive unbounded sequence of data elements continuously generated at a rapid rate. Due to this reason, most algorithms for data streams sacrifice the correctness of their results for fast processing time. This paper proposes a clustering method over a data stream based on statistical μ-partition. The multi-dimensional space of a data domain is divided into a set of mutually exclusive equal-size initial cells. A cell maintains the distribution statistics of data elements in its range. Based on the distribution statistics of a cell, a dense cell is dynamically split into two mutually exclusive smaller cells called intermediate cells. Eventually, the dense sub-range of an initial cell is recursively partitioned until it becomes the smallest cell called a unit cell. A cluster of a data stream is a group of adjacent dense unit cells. As the size of a unit cell is set to be smaller, the resulting set of clusters is more accurately identified. Through a series of experiments, the performance of the proposed algorithm is comparatively analyzed.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call