Abstract

Due to the popularity of social media and online fora, such as Twitter, Reddit, Facebook, and Wechat, short text stream clustering has gained significant attention in recent years. However, most existing short text stream clustering approaches usually work on static data and tend to cause a "term ambiguity" problem due to the sparse word representation. Beyond, they often exploit short text streams in a batch way and are difficult to find evolving topics in term-changing subspaces. In this article, we propose an online semantic-enhanced graphical model for evolving short text stream clustering (OSGM), by exploiting the word-occurrence semantic information and dynamically maintaining evolving active topics in term-changing subspaces in an online way. Compared to the existing approaches, our online model is not only free of determining the optimal batch size but also lends itself to handling large-scale data streams efficiently. It is also able to handle the "term ambiguity" problem without incorporating features from external resources. More importantly, to the best of our knowledge, it is the first work to extract evolving topics in term-changing subspaces automatically in an online way. Extensive experiments demonstrate that the proposed model yields better performance compared to many state-of-the-art algorithms on both synthetic and real-world datasets.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.