Abstract

Twitter is an extremely high volume platform for user generated contributions regarding any topic. The wealth of content created at real-time in massive quantities calls for automated approaches to identify the topics of the contributions. Such topics can be utilized in numerous ways, such as public opinion mining, marketing, entertainment, and disaster management. Towards this end, approaches to relate single or partial posts to knowledge base items have been proposed. However, in microblogging systems like Twitter, topics emerge from the culmination of a large number of contributions. Therefore, identifying topics based on collections of posts, where individual posts contribute to some aspect of the greater topic is necessary. Models, such as Latent Dirichlet Allocation (LDA), propose algorithms for relating collections of posts to sets of keywords that represent underlying topics. In these approaches, figuring out what the specific topic(s) the keyword sets represent remains as a separate task. Another issue in topic detection is the scope, which is often limited to specific domain, such as health. This work proposes an approach for identifying domain-independent specific topics related to sets of posts. In this approach, individual posts are processed and then aggregated to identify key tokens, which are then mapped to specific topics. Wikipedia article titles are selected to represent topics, since they are up to date, user-generated, sophisticated articles that span topics of human interest. This paper describes the proposed approach, a prototype implementation, and a case study based on data gathered during the heavily contributed periods corresponding to the four US election debates in 2012. The manually evaluated results (0.96 precision) and other observations from the study are discussed in detail.

Highlights

  • Twitter [1] is the most popular microblogging system in the world with over 280 million active users tweeting around 40K posts/s [2]

  • We propose to use the titles of Wikipedia articles to represent topics

  • Considering the limited length of microblog posts which leads to a limited context, and discarding the descriptive content of Wikipedia article bodies may lead to less inclusive and less descriptive topics as we show in Comparison of processing single-microblog posts and microblog post sets section while examining some cases by comparing the results between an approach that aggregates what [34, 35] returns and our own proposed approach

Read more

Summary

Introduction

Twitter [1] is the most popular microblogging system in the world with over 280 million active users tweeting around 40K posts/s [2]. It serves as a collective platform where users tweet (post) anything about anything [3], such as current events, sports, politics, health, conferences, personal life, etc. This way, the author makes a connection between Obama’s words and the context of the debate. He can add his opinion on the subject if he wants

Objectives
Methods
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.