Abstract

The hot topics discussed on microblogs mirror public opinion, so the topic detection on microblogs is of great significance for the detection and management of public opinion. However, it is difficult for traditional clustering algorithms to handle the large-scale microblogging data with various topics and high noise. Therefore, we propose a three-layer hybrid algorithm to tackle this problem. In the first layer, we use the $K$ -means algorithm, in which the initial center selection optimized to group the microblog texts efficiently. We then subdivide big clusters and isolate noise text to get purer clusters. In the second layer, we adopt the agglomerative nesting (AGNES) algorithm to merge the small clusters referring to the same topic. Then, we exclude most noise, reducing their further impact on the $K$ -means in the third layer which corrects the erroneous merging occurring in AGNES. Experiments show that our algorithm outperforms some related traditional algorithms on the clustering of real microblogging data set and performs well in the topic detection.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.