Abstract

On Twitter, the short nature of the post forces users to remain concise while conveying the main ideas to other users. Hence, the challenge is on how to use the unstructured texts to extract information that can be valuable for organizations. We investigate the best methodology to perform microblog summarization of topics discussed on Twitter. First, we classify the microblogs related to the topic into positive, negative, or neutral sentiments, and then we extract sub-topics (i.e., topic aspects), and pick the top N ranked aspects by sentiment temperature for final summarization. We utilize known algorithms for annotation, sentiment analysis, and clustering to determine which combination yields the best results. This paper attempts to address how sentiment analysis in conjunction with aspect extraction of topics can yield more effective summarization. Evaluation results show that sentiment analysis and aspect extraction improve the overall summarization of topics compared to baseline technique.

Highlights

  • The idea of microblogging occurred to Jack Dorsey of Odeo, Inc., when he and his team wanted to use the concept of Short Messaging Service (SMS) online, where a user can broadcast a message to anyone or a specific group of followers [Sagolla, 2009]1

  • We will present the results of our evaluation and discuss how the creation of Word Graphs helps in overall summaries of the topics

  • One popular automatic evaluation metric that has been adopted by the Document Understanding Conference (DUC) is ROUGE

Read more

Summary

Introduction

Our goal is to determine whether having Word Graphs to induce aspects improves the overall summarization process and if sentiment temperatures rank aspects correctly as most positive, most negative, or most neutral. For evaluating summaries prior to Word Graph construction (first workflow), each volunteer was given three sets of tweets for their assigned topic: positive, negative, and neutral. For evaluating summaries after Word Graph construction (second workflow), volunteers were given three sets of tweets for each topic: positive, negative, and neutral. Each of these sets contained four more sets of tweets. These subsets corresponded to only the top four ranked aspects by sentiment temperature as determined by SentiWordNet and aspect information For each aspect, they were required to group the tweets into four clusters and pick a representative tweet from each cluster to obtain a four sentence summary for that aspect. We used Sharifi's Phrase Reinforcement algorithm for comparison

Motivation
Approach Overview
Contributions
Thesis Organization
Named Entity Recognition and Annotation
Determining Semantics in Tweets
Microblog Summarization
Summarization based on Corpus Snapshot
Summarization based on Topic
Sentiment Analysis
Multinomial Naive Bayes
Feature Selection for Multinomial Naive Bayes
Recursive Neural
Word Graphs and Graph Clustering Techniques
Document Summarization Techniques
Agglomerative Clustering
Bisect K-Means++ Clustering
Summary
Data collection
Assigning Tweets to Topics
Preprocessing the Tweets
Word Graph Construction
Clustering Techniques for Word Graphs for Aspects Extraction
Clustering Techniques for Documents (Microblogs)
Content Scores
Evaluation Methods
ROUGE-1 Scores
Aspect Ranking by Sentiment Temperatures
Chapter 6. Summary and Conclusions
Topic Selection
Clustering Examples
Process Overview
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call