Today’s world needs new methods to deal with and analyze the ever-increasingly generated data streams. Two of the most challenging aspects of data streams are (i) concept drift, i.e. evolution of data stream over time, which requires the ability to make timely decisions against the high speed of receiving new data; (ii) limited memory storage and the impracticality of using memory due to the large amount of data. Clustering is one of the common methods to process data streams. In this paper, we propose a novel, fully-online, density-based method for clustering evolving data streams. In recent years, a number of methods have been proposed, which also have the ability to cluster data streams. The main limitation of these methods is the use of parameters based on knowledge-expert. This work is among the first works to address this issue. Constant False Clustering Probability, CFCP, has tried to choose the algorithm’s parameters based on Statistics. The proposed method has also the ability to identify clusters with arbitrary shapes. It is robust to noise and offers high accuracy and efficiency in both low and high dimensions. In this method, we determine the value of the parameters by using statistical theories and do not require more information, taking advantage of expert-knowledge. The presented experimental results show that the method performs data clustering at high speeds without reducing the quality compared to the state-of-the-art algorithms.
Read full abstract