Abstract

We study the influence of different clustering algorithms on cluster evolution monitoring in data streams. The capturing and interpretation of cluster change delivers indicators on the evolution of the underlying population. For text stream monitoring, the clusters can be summarized into topics, so that cluster monitoring provides insights on the data and decline of thematic subjects over time. However, such insights should always be taken with a grain of salt: The quality of the clusters has a decisive impact on the observed changes. In the simplest case, cluster change across the stream may be due to the low quality of the original cluster than to a drift in the population belonging to this cluster.We show our framework ThemeFinder for topic evolution monitoring in streams and compare the influence to the quality of two very different cluster algorithms. After an evaluation of different cluster algorithms with external and internal quality measures, we use the center based bisecting k-means algorithm and the density-based DBScan algorithm. Our results show that the influence is relatively high and show that different clustering algorithms results allow to draw conclusion to the evaluation of the other cluster algorithm. Our experiments were done on a subarchive of the ACM library.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.