Solving the twitter sentiment analysis problem based on a machine learning-based approach

Fatemeh Zarisfi Kermani,Esfandiar Eslami,Faramarz Sadeghi

doi:10.1007/s12065-019-00301-x

Fatemeh Zarisfi Kermani, Esfandiar Eslami + Show 1 more

https://doi.org/10.1007/s12065-019-00301-x

Copy DOI

Abstract

Twitter Sentiment Analysis (TSA) as part of a text classification task has been widely attended by researchers in recent years. This paper presents a machine learning approach to solving the TSA problem in three phases. In the second phase, a suitable value for representing each feature in the Vector Space Model is determined through the weighted combination of the values obtained from four methods (i.e., Term Frequency and Inverse Document Frequency, semantic similarity, sentiment scoring using SentiWordNet, and sentiment scoring based on the class of tweets). In this manner, finding the percentage of contributions or weights of each method is defined as an optimization problem and solved using a genetic algorithm. Also, the weighted values obtained from four methods are combined based on the Einstein sum as an important T-conorm method. Finally, the performance of the proposed method is tested based on the accuracy of support vector machine and multinomial naive Bayes classification algorithms on four famous Twitter datasets, namely the Stanford testing dataset, STS-Gold dataset, Obama-McCain Debate dataset, and Strict Obama-McCain Debate dataset. The obtained results show the high superiority of the proposed method in comparison with the other methods.

Full Text