Abstract
Microblogging platforms, such as Twitter, have already played an important role in recent cultural, social and political events. Discovering latent topics from social streams is therefore important for many downstream applications, such as clustering, classification or recommendation. However, traditional topic models that rely on the bag-of-words assumption are insufficient to uncover the rich semantics and temporal aspects of topics in Twitter. In particular, microblog content is often influenced by external information sources, such as Web documents linked from Twitter posts, and often focuses on specific entities, such as people or organizations. These external sources provide useful semantics to understand microblogs and we generally refer to these semantics as auxiliary semantics . In this article, we address the mentioned issues and propose a unified framework for Multifaceted Topic Modeling from Twitter streams. We first extract social semantics from Twitter by modeling the social chatter associated with hashtags. We further extract terms and named entities from linked Web documents to serve as auxiliary semantics during topic modeling. The Multifaceted Topic Model (MfTM) is then proposed to jointly model latent semantics among the social terms from Twitter, auxiliary terms from the linked Web documents and named entities. Moreover, we capture the temporal characteristics of each topic. An efficient online inference method for MfTM is developed, which enables our model to be applied to large-scale and streaming data. Our experimental evaluation shows the effectiveness and efficiency of our model compared with state-of-the-art baselines. We evaluate each aspect of our framework and show its utility in the context of tweet clustering.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.