Fine Grained Sentiment Analysis of Malayalam Tweets Using Lexicon Based and Machine Learning Based Approaches

Soumya S,Pramod K V

doi:10.1109/icnte51185.2021.9487741

Abstract

Fine-Grained Sentiment Analysis (FGSA) of Malayalam Tweets have been implemented in this work. The tweets are classified into positive, strongly positive, negative, strongly negative, and neutral sentiments. Both lexicon-based and machine learning-based approaches are used for sentiment classification of Malayalam Tweets. Lexicon based approach uses both dictionary-based and corpus-based approach. The dictionary-based approach is used in this work. The machine learning algorithms such as Support Vector Machine (SVM) and Random Forest (RF) classifiers are used for sentiment classification of the dataset. Bag of Words (BoW), Term-Frequency vs. Inverse Document Frequency (TF-IDF), and Sentiwordnet feature matrices are used to vectorize the input dataset. Lexicon based approach got an accuracy of 84.8%. In machine learning algorithms, the SVM (kernel = linear), SVM (kernel = RBF) and RF with the Sentiwordnet feature vector got an accuracy of 92.6%, 92.9%, and 93.4%, respectively.

Full Text