This study compares the performance of stream clustering algorithms (DenStream, CluStream, ClusTree) on Massive Online Analysis (MOA) using synthetic and real-world datasets. The algorithms are compared in the presence on noise level [0%, 10%, 30%] on the synthetic data. DenStream epsilon parameter was tune to 0.01 and 0.03 to improve its performance. We use the performance evaluation metrics CMM, F1-P, F1-R, Purity, Silhouette Coefficient, and Rand statistic. On synthetic data, our results show that ClusTree outperformed CluStream and DenStream on the almost all the metrics except in Purity and Silhouette were DenStream performs better at noise levels (10% and 30%). ClusTree outperform CluStream and DenStream on Forest Cover type dataset on metrics CMM, F1-P, F1-R, Silhouette Coefficient, and Rand statistic with 90%, 74%, 77% and 89% respectively. However, the tune DenStream epsilon parameter shows some improvements. On electricity data, DenStream outperform CluStream and ClusTree at epsilon parameter (0.03 and 0.05) on metrics F1-P, F1-R, and Purity. The investigation of DenStream epsilon parameter (0.03 and 0.05) on RandomBRF Generator with noise level [0%, 10%, 30%] shows that DenStream with epsilon 0.03 outperform other parameter adjustment.
Read full abstract