Hybrid classifier for sentiment analysis in Malayalam with modified TF-IDF features

Pramitha P Ambily,John T Abraham

doi:10.1142/s1793962323500381

Abstract

Sentiment Analysis (SA) is a computational study that examines people’s opinions, attitudes, and opinions based on their written text. Keralites’ mother tongue, Malayalam, is the most often used language to express themselves on Twitter. As there is no automatic Sentiment Analyzer in Malayalam, the SA of Twitter messages is necessary. In this research work, a Malayalam sentiment analysis model is introduced. The input raw data in the form of reviews are fed for the pre-processing stage. The pre-processing module includes sentence tokenization via Sandi Splitting and parts of speech (POS) tagging. Subsequently, Bag of Word, the Proposed weightage-based Term Frequency-Inverse Document Frequency, Unigram with the dictionary and Unigram with dictionary including negation words have been considered for feature vector formation. Finally, the review classification is undergone via the proposed hybrid classifier, which is constructed by hybridizing the Long Short-Term Memory (LSTM) and Convolutional Neural Network (CNN), respectively. To enhance the classification performance, the weight of the CNN classifier will be fine-tuned via an Improved Bumble bee mating Optimization (IBBMO), which is an advanced version of standard BBMO. The performance of the proposed work is compared over other conventional models in terms of positive, negative, convergence, and other measures. The accuracy of the proposed method is 95.436, which is much better than the existing works like DBN[Formula: see text]86.25, NN[Formula: see text]87.080, SVM[Formula: see text]87.234, WOA+Hybrid Classifier[Formula: see text]88.440, LA+Hybrid Classifier[Formula: see text]89.35285, CSO+Hybrid Classifier[Formula: see text]92.021, BBMO+Hybrid Classifier[Formula: see text]93.4248, and ML techniques[Formula: see text]94.014, at the 90th training percentage.

Full Text