Large Scale Implementations for Twitter Sentiment Classification

Andreas Kanavos,Dimitrios Tsolis,Athanasios Tsakalidis,Giannis Tzimas,Spyros Sioutas,Nikolaos Nodarakis

doi:10.3390/a10010033

Abstract

Sentiment Analysis on Twitter Data is indeed a challenging problem due to the nature, diversity and volume of the data. People tend to express their feelings freely, which makes Twitter an ideal source for accumulating a vast amount of opinions towards a wide spectrum of topics. This amount of information offers huge potential and can be harnessed to receive the sentiment tendency towards these topics. However, since no one can invest an infinite amount of time to read through these tweets, an automated decision making approach is necessary. Nevertheless, most existing solutions are limited in centralized environments only. Thus, they can only process at most a few thousand tweets. Such a sample is not representative in order to define the sentiment polarity towards a topic due to the massive number of tweets published daily. In this work, we develop two systems: the first in the MapReduce and the second in the Apache Spark framework for programming with Big Data. The algorithm exploits all hashtags and emoticons inside a tweet, as sentiment labels, and proceeds to a classification method of diverse sentiment types in a parallel and distributed manner. Moreover, the sentiment analysis tool is based on Machine Learning methodologies alongside Natural Language Processing techniques and utilizes Apache Spark’s Machine learning library, MLlib. In order to address the nature of Big Data, we introduce some pre-processing steps for achieving better results in Sentiment Analysis as well as Bloom filters to compact the storage size of intermediate data and boost the performance of our algorithm. Finally, the proposed system was trained and validated with real data crawled by Twitter, and, through an extensive experimental evaluation, we prove that our solution is efficient, robust and scalable while confirming the quality of our sentiment identification.

Highlights

Nowadays, users tend to disseminate information through short 140-character messages called “tweets”, on different aspects on Twitter
We propose a novel distributed framework implemented in Hadoop [10], the open source MapReduce implementation [9] as well as in Spark [11], an open source platform that translates the developed programs into MapReduce jobs
We evaluate our method using two Twitter datasets that we have collected through the Twitter Search API [51] between November 2014 to August 2015

Summary

Introduction

Users tend to disseminate information through short 140-character messages called “tweets”, on different aspects on Twitter. They follow other users in order to receive their status updates. Twitter constitutes a wide spreading instant messaging platform and people use it to get informed about world news, recent technological advancements, and so on. A variety of opinion clusters that contain rich sentiment information is formed. Sentiment is defined as “A thought, view, or attitude, especially one based mainly on emotion instead of reason” [1] and describes someone’s mood or judge towards a specific entity.

Objectives

Results

Conclusion

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Algorithms	Publication Date: Mar 4, 2017
Citations: 46	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

Large Scale Implementations for Twitter Sentiment Classification

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Algorithms

Lead the way for us

Similar Papers

An Apache Spark Implementation for Sentiment Analysis on Twitter Data
Alexandros Baltas ... Andreas Kanavos
-
Alexandros Baltas, et. al.Alexandros Baltas ... Andreas Kanavos
01 Jan 2017
01 Jan 2017

A spark‐based big data analysis framework for real‐time sentiment prediction on streaming data
Deniz Kılınç
Software: Practice and Experience | VOL. 49
Deniz KılınçDeniz Kılınç
27 Jun 2019
Software: Practice and Experience | VOL. 49

Understanding Software-2.0
Malinda Dilhara ... Ameya Ketkar
ACM Transactions on Software Engineering and Methodology | VOL. 30
Malinda Dilhara, et. al.Malinda Dilhara ... Ameya Ketkar
23 Jul 2021
ACM Transactions on Software Engineering and Methodology | VOL. 30

Enhanced sentiment labeling and implicit aspect identification by integration of deep convolution neural network and sequential algorithm
Jinzhan Feng ... Xiaomeng Ma
Cluster Computing | VOL. 22
Jinzhan Feng, et. al.Jinzhan Feng ... Xiaomeng Ma
12 Jan 2018
Cluster Computing | VOL. 22

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Large Scale Implementations for Twitter Sentiment Classification

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Algorithms