Abstract

Sentiment analysis is essential since it benefits many fields, such as politics and economics. Because much data is generated every moment, a real-time processing system can efficiently analyze sentiment. This paper uses Spark to simulate real-time tweet sentiment analysis, and compares the performances of three machine learning methods, Logistic Regression, Naive Bayes, and Decision Tree. The idea of the real-time tweet sentiment analysis system is using Spark Streaming to send a batch of tweets every fixed period to a machine learning pipeline to predict the emotions of tweets. In the pipeline, tweets will be tokenized first, then the stop words in tweets will be removed. After that, the author uses TF-IDF to extract features, transferring data from unstructured to structured. The last stage is using the machine learning method to predict the sentiments of tweets. By comparing, Logistic Regression has the best performance, and the second one is Naive Bayes, Decision Tree performs not as well as the other two methods.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call