Abstract

AbstractIn sentiment analysis, we try to find out the writer's view about any product, events, government policy, services, topics, individual, etc., through the text written by them on social media platforms like Twitter, Facebook, etc. This study has considered two datasets (STS-Gold and IMDb) on a different domain and with varying lengths of text. The objective of this study is to know which classification algorithm performs better on two domains of text with different length. We have applied six machine learning algorithms (support vector machine, logistic regression, K-Nearest Neighbors, random forest, Naïve Bayes, and decision tree) and compared them on the basis f-score, precision, recall, and accuracy. In the IMDb dataset, logistic regression performs better among all and gives the highest accuracy of 96.3% and f-score of 80.6%. The second highest is achieved with Naïve Bayes with 95.89 and 80.05% f-score. Naïve Bayes gives the highest accuracy of 81.08% and an f-score of 42.45% in the STS-Gold dataset. The second highest is achieved with logistic regression giving an accuracy of 80.09 and 41.52% f-score. We found that logistic regression and Naïve Bayes are performing better among all the algorithms on both datasets.KeywordsClassificationTF-IDFText miningSentiment analysisMachine learning

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call