Abstract
On Twitter, the short nature of the post forces users to remain concise while conveying the main ideas to other users. Hence, the challenge is on how to use the unstructured texts to extract information that can be valuable for organizations. We investigate the best methodology to perform microblog summarization of topics discussed on Twitter. First, we classify the microblogs related to the topic into positive, negative, or neutral sentiments, and then we extract sub-topics (i.e., topic aspects), and pick the top N ranked aspects by sentiment temperature for final summarization. We utilize known algorithms for annotation, sentiment analysis, and clustering to determine which combination yields the best results. This paper attempts to address how sentiment analysis in conjunction with aspect extraction of topics can yield more effective summarization. Evaluation results show that sentiment analysis and aspect extraction improve the overall summarization of topics compared to baseline technique.
Highlights
The idea of microblogging occurred to Jack Dorsey of Odeo, Inc., when he and his team wanted to use the concept of Short Messaging Service (SMS) online, where a user can broadcast a message to anyone or a specific group of followers [Sagolla, 2009]1
We will present the results of our evaluation and discuss how the creation of Word Graphs helps in overall summaries of the topics
One popular automatic evaluation metric that has been adopted by the Document Understanding Conference (DUC) is ROUGE
Summary
Our goal is to determine whether having Word Graphs to induce aspects improves the overall summarization process and if sentiment temperatures rank aspects correctly as most positive, most negative, or most neutral. For evaluating summaries prior to Word Graph construction (first workflow), each volunteer was given three sets of tweets for their assigned topic: positive, negative, and neutral. For evaluating summaries after Word Graph construction (second workflow), volunteers were given three sets of tweets for each topic: positive, negative, and neutral. Each of these sets contained four more sets of tweets. These subsets corresponded to only the top four ranked aspects by sentiment temperature as determined by SentiWordNet and aspect information For each aspect, they were required to group the tweets into four clusters and pick a representative tweet from each cluster to obtain a four sentence summary for that aspect. We used Sharifi's Phrase Reinforcement algorithm for comparison
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have