Abstract

In this paper, a new online-offline density-based clustering method for data stream with varying density is proposed. In the online phase, the summary of data is created (often known as micro-clusters) and in the offline phase, this synopsis of data is used to form the final clusters. Finding the accurate micro-clusters is the goal of online phase. When a new data point arrives, the procedure of finding the nearest and best fit micro-cluster is the time consuming process. This procedure can lead to increase the execution time. To address this problem, a new merging algorithm is proposed. For maintaining a limited number of micro-clusters, a pruning process is applied along with the summarization process. In the existing methods, this pruning process takes too long time to remove micro-clusters whose do not receive objects frequently that cause to increase the memory usage. In this paper, to solve this problem, a new pruning algorithm is introduced. Another problem with density-based methods is that they use global parameters in the data sets with varying density that can lead to dramatic decrease in the clustering quality. In our work, to create final clusters, a new density-based algorithm that works based on only MinPts parameter is proposed for increasing the clustering quality of data sets with varying density. The performance evaluation on both synthetic and real data sets illustrates the efficiency and effectiveness of the proposed method. The experimental results show that our method can increase the clustering quality in data sets with varying density along with limited time and memory usage.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call