An Evaluation of SVM and Naive Bayes with SMOTE on Sentiment Analysis Data Set

Andrew Christian Flores,Christine F Pena,Ken D Gorro,Rogelyn I Icoy

doi:10.1109/iceast.2018.8434401

Andrew Christian Flores, Christine F Pena + Show 2 more

https://doi.org/10.1109/iceast.2018.8434401

Copy DOI

Export

Save

Cite

Publication Date: Jul 1, 2018

Citations: 25

Affiliation: University of San Carlos

Abstract
Full-Text
Similar Papers

Abstract

Listen

Data classification is highly significant in data mining which leads to a number of studies in machine learning with preprocessing and algorithmic technique. Class imbalance is a problem in data classification wherein a class of data will outnumber another data class. Sentiment Analysis is an evaluation of written and spoken language which determines a person's expressions, sentiments, emotions and attitudes and is commonly used as dataset in machine learning. This study is a comparative analysis of Support Vector Machine (SVM) algorithm: Sequential Minimal Optimization (SMO) with Synthetic Minority Over-Sampling Technique (SMOTE) and Naive Bayes Multinomial (NBM) algorithm with SMOTE for classification of data given the same Sentiment Analysis datasets gathered by students of University of San Carlos. Weka, a Graphic User Interface (GUI) with a collection of machine learning algorithms for data mining, is use to preprocess and classify the datasets. The results had shown that 10 Folds validation provides better findings compared to 70:30 split in testing SVM and NBM with SMOTE. However, it also depends on how the datasets is preprocessed especially when it contains noisy data.

Full Text