Abstract

This study compares the performance of stream clustering algorithms (DenStream, CluStream, ClusTree) on Massive Online Analysis (MOA) using synthetic and real-world datasets. The algorithms are compared in the presence on noise level [0%, 10%, 30%] on the synthetic data. DenStream epsilon parameter was tune to 0.01 and 0.03 to improve its performance. We use the performance evaluation metrics CMM, F1-P, F1-R, Purity, Silhouette Coefficient, and Rand statistic. On synthetic data, our results show that ClusTree outperformed CluStream and DenStream on the almost all the metrics except in Purity and Silhouette were DenStream performs better at noise levels (10% and 30%). ClusTree outperform CluStream and DenStream on Forest Cover type dataset on metrics CMM, F1-P, F1-R, Silhouette Coefficient, and Rand statistic with 90%, 74%, 77% and 89% respectively. However, the tune DenStream epsilon parameter shows some improvements. On electricity data, DenStream outperform CluStream and ClusTree at epsilon parameter (0.03 and 0.05) on metrics F1-P, F1-R, and Purity. The investigation of DenStream epsilon parameter (0.03 and 0.05) on RandomBRF Generator with noise level [0%, 10%, 30%] shows that DenStream with epsilon 0.03 outperform other parameter adjustment.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.