Implementation of Support Vector Machine with Lexicon Based for Sentiment Analysis on Twitter

Nidaul Hasanati,Qurrotul Aini,Arndini Nuri

doi:10.1109/citsm56380.2022.9935887

Abstract

Twitter is one of the social media that is widely used where Indonesia occupies the 6th largest Twitter user in the world. This research is a quantitative study on fine-grained sentiment analysis that extracts sentiment with the topic of the covid vaccine from Twitter with the aim of implementing the Support Vector Machine algorithm. The research flow uses the SEMMA method (Sample, Explore, Modify, Model, and Assess). The collection of data sets in the form of tweets crawled from Twitter by utilizing the Twitter API at the sample stage for further exploration of the attributes of the data set at the explore stage. The modify stage is text preprocessing so that the data set is more structured. After that is the model stage which applies the lexicon based method to assign sentiment classes to the data set. Data sets that have labels will be classified using the Naïve Bayes method and the Support Vector Machine. The final stage of the SEMMA method is to assess the method applied using confusion matrix and k-fold Cross Validation. The accuracy results from the Support Vector Machine method, the best parameter results using the CV Grid Search are the rbf kernel with <tex xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">$\boldsymbol{C=100}$</tex> and degree = 0.01 resulting in an accuracy of 85%. The accuracy of the implementation of the Support Vector Machine algorithm produces good scores for the Covid-19 vaccine topic, so that the algorithm can be applied to the classification of sentiment analysis on new data.

Full Text