Abstract

Opinion (sentiment) analysis on big data streams from the constantly generated text streams on social media networks to hundreds of millions of online consumer reviews provides many organizations in every field with opportunities to discover valuable intelligence from the massive user generated text streams. However, the traditional content analysis frameworks are inefficient to handle the unprecedentedly big volume of unstructured text streams and the complexity of text analysis tasks for the real time opinion analysis on the big data streams. In this paper, we propose a parallel real time sentiment analysis system: Social Media Data Stream Sentiment Analysis Service (SMDSSAS) that performs multiple phases of sentiment analysis of social media text streams effectively in real time with two fully analytic opinion mining models to combat the scale of text data streams and the complexity of sentiment analysis processing on unstructured text streams. We propose two aspect based opinion mining models: Deterministic and Probabilistic sentiment models for a real time sentiment analysis on the user given topic related data streams. Experiments on the social media Twitter stream traffic captured during the pre-election weeks of the 2016 Presidential election for real-time analysis of public opinions toward two presidential candidates showed that the proposed system was able to predict correctly Donald Trump as the winner of the 2016 Presidential election. The cross validation results showed that the proposed sentiment models with the real-time streaming components in our proposed framework delivered effectively the analysis of the opinions on two presidential candidates with average 81% accuracy for the Deterministic model and 80% for the Probabilistic model, which are 1% - 22% improvements from the results of the existing literature.

Highlights

  • In the era of the web based social media, user-generated contents in “any” form of user created content including: blogs, wikis, forums, posts, chats, tweets, or podcasts have become the norm of media to express people’s opinion

  • We propose a parallel real time sentiment analysis system: Social Media Data Stream Sentiment Analysis Service (SMDSSAS) that performs multiple phases of sentiment analysis of social media text streams effectively in real time with two fully analytic opinion mining models to combat the scale of text data streams and the complexity of sentiment analysis processing on unstructured text streams

  • We propose two sentiment models that are combined models of topic, lexicon and aspect based sentiment analysis that can be applied to a real-time big data stream in cooperation with the most recent natural language processing (NLP) techniques: Deterministic Topic Model that accurately measures user sentiments in the subjectivity and the context of user provided topic word(s)

Read more

Summary

Introduction

In the era of the web based social media, user-generated contents in “any” form of user created content including: blogs, wikis, forums, posts, chats, tweets, or podcasts have become the norm of media to express people’s opinion. While traditional content analysis takes days or weeks to complete, opinion analysis of such streaming of large amounts of user-generated text have commanded research and development of a new generation of analytics methods and tools to process them in real-time or near-real time effectively. Big data is often defined with the three characteristics: volume, velocity and variety [1] [2] because of the nature of being constantly generated massive data sets having large, varied and complex structures or often unstructured (e.g. tweet text) Those three characteristics of big data imply difficulties of storing, analyzing and visualizing for further processes and results with traditional data analysis systems. Topic-based opinion mining seeks to extract personal viewpoints and emotions surrounding social or political events by semantically orienting user-generated content that has been correlated by topic word(s) [22]

Related Works
Architecture of Big Data Stream Analytics Framework
Sentiment Model
Context Identification
Measure of Subjectivity in Sentiment
Deterministic Topic Model
Probabilistic Topic Model
Multinomial Naive Bayes
Experiments
Predicting the Outcome of 2016 Presidential Election in Pre-Election Weeks
Predicting with Deterministic Topic Model
Findings
Conclusions

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.