A Methodology to Handle Heterogeneous Data Generated in Online Social Networks

K Sailaja Kumar,Pratap Rudra Sahoo,D Evangelin Geetha

doi:10.1166/jctn.2020.9025

Abstract

Analyzing the heterogeneous data generated by social networking sites is a research challenge. Twitter is a massive social networking site. In this paper, for processing the heterogeneous data, a methodology is devised, which helps in categorizing the data obtained from Twitter into different directories and understanding the text data explicitly. The methodology is implemented using Python programming language. Python’s tweepy package is used to download the Twitter stream data which includes images, videos and text data. Python’s Aylien API is used for analyzing the Twitter text data. Using this API, sentiment analysis report is generated. Using Python’s matplotlib package, a pie chart is generated to visualize the sentiment analysis results. Further an algorithm is proposed for sentiment analysis, which not only categorizes the tweets into positive, negative and neutral (as Aylien API does), but also categorizes the tweets into strongly and weakly, positive and negative based on the polarity and subjectivity. Django platform and Python’s TextBlob package are used for implementing this algorithm. For this experiment, data is collected from Twitter using the hash tags related to different events/topics like IPL2018, World Cup2018, Modi, and Delete Facebook etc. during the period Monday Jan 22, 2018 to Monday May 28, 2018. Moreover, the data is collected and processed using Python TextBlob. Also conducted the Sentiment analysis on text data using TextBlob and visual reports are generated using Google chart. The results obtained from both the above-mentioned approaches are compared and it is observed that the proposed algorithm gives better sentiment analysis of the tweets.

Full Text