Simulating real-time tweet sentiment analysis by different machine learning methods based on spark

Ertong Wei

doi:10.54254/2753-8818/18/20230402

Ertong Wei

Open Access

PDF Available

https://doi.org/10.54254/2753-8818/18/20230402

Copy DOI

Export

Save

Cite

Abstract
Full-Text PDF
Similar Papers

Abstract

Listen

Sentiment analysis is essential since it benefits many fields, such as politics and economics. Because much data is generated every moment, a real-time processing system can efficiently analyze sentiment. This paper uses Spark to simulate real-time tweet sentiment analysis, and compares the performances of three machine learning methods, Logistic Regression, Naive Bayes, and Decision Tree. The idea of the real-time tweet sentiment analysis system is using Spark Streaming to send a batch of tweets every fixed period to a machine learning pipeline to predict the emotions of tweets. In the pipeline, tweets will be tokenized first, then the stop words in tweets will be removed. After that, the author uses TF-IDF to extract features, transferring data from unstructured to structured. The last stage is using the machine learning method to predict the sentiments of tweets. By comparing, Logistic Regression has the best performance, and the second one is Naive Bayes, Decision Tree performs not as well as the other two methods.

Full Text