Improve the Accuracy of Support Vector Machine Using Chi Square Statistic and Term Frequency Inverse Document Frequency on Movie Review Sentiment Analysis

Ukhti Ikhsani Larasati,Riza Arifudin,Alamsyah Alamsyah,Much Aziz Muslim

doi:10.15294/sji.v6i1.14244

Ukhti Ikhsani Larasati, Riza Arifudin + Show 2 more

Open Access

https://doi.org/10.15294/sji.v6i1.14244

Copy DOI

Abstract

Data processing can be done with text mining techniques. To process large text data is required a machine to explore opinions, including positive or negative opinions. Sentiment analysis is a process that applies text mining methods. Sentiment analysis is a process that aims to determine the content of the dataset in the form of text is positive or negative. Support vector machine is one of the classification algorithms that can be used for sentiment analysis. However, support vector machine works less well on the large-sized data. In addition, in the text mining process there are constraints one is number of attributes used. With many attributes it will reduce the performance of the classifier so as to provide a low level of accuracy. The purpose of this research is to increase the support vector machine accuracy with implementation of feature selection and feature weighting. Feature selection will reduce a large number of irrelevant attributes. In this study the feature is selected based on the top value of K = 500. Once selected the relevant attributes are then performed feature weighting to calculate the weight of each attribute selected. The feature selection method used is chi square statistic and feature weighting using Term Frequency Inverse Document Frequency (TFIDF). Result of experiment using Matlab R2017b is integration of support vector machine with chi square statistic and TFIDF that uses 10 fold cross validation gives an increase of accuracy of 11.5% with the following explanation, the accuracy of the support vector machine without applying chi square statistic and TFIDF resulted in an accuracy of 68.7% and the accuracy of the support vector machine by applying chi square statistic and TFIDF resulted in an accuracy of 80.2%.

Highlights

Distribution of information supported by technological developments that better facilitate the public in obtaining information for free and in large numbers, one of which is textual information
Text mining is similar to data mining, a tool for data mining is designed for structured data from a database but text mining is designed for unstructured or semi-structured datasets such as word documents, emails, and more
After being given chi square treatment statistic and Term Frequency Inverse Document Frequency (TFIDF) support vector machine algorithm achieved the highest level of accuracy when the top value of K = 212 is 80.2% with an accuracy increase of 11.5%

Summary

Introduction

Distribution of information supported by technological developments that better facilitate the public in obtaining information for free and in large numbers, one of which is textual information. Textual information can be categorized into two, namely the facts and opinions. Fact is an objective expression of an entity, event, or nature of an object. While opinion is a subjective expression that describes a person's sentiments, opinions, or feelings about an entity, event, and nature. Textual information can be processed using the text mining process. According to [2], text mining can be broadly defined as an intensive knowledge process where users interact with datasets using analytical tools. Text mining is known as text data mining [3]. Text mining is similar to data mining, a tool for data mining is designed for structured data from a database but text mining is designed for unstructured or semi-structured datasets such as word documents, emails, and more

Objectives

Methods

Findings

Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Scientific Journal of Informatics	Publication Date: May 24, 2019
Citations: 11	License type: cc-by

R Discovery Prime

R Discovery Prime

Improve the Accuracy of Support Vector Machine Using Chi Square Statistic and Term Frequency Inverse Document Frequency on Movie Review Sentiment Analysis

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Scientific Journal of Informatics

Lead the way for us

Similar Papers

ALGORITMA SUPPORT VECTOR MACHINE BERBASIS ALGORITMA GENETIKA UNTUK ANALISIS SENTIMEN PADA TWITTER

-

08 Aug 2015
08 Aug 2015

Increasing Accuracy of Support Vector Machine (SVM) By Applying N-Gram and Chi-Square Feature Selection for Text Classification
Setiangga Fachrurrozi ... Farrikh Al Zami
-
Setiangga Fachrurrozi, et. al.Setiangga Fachrurrozi ... Farrikh Al Zami
18 Sep 2021
18 Sep 2021

Comparison of the Effects of Feature Selection and Tree-Based Ensemble Machine Learning for Sentiment Analysis on Indonesian YouTube Comments
Siti Khomsah ... Agus Sasmito Aribowo
-
Siti Khomsah, et. al.Siti Khomsah ... Agus Sasmito Aribowo
01 Jan 2020
01 Jan 2020

Optimize Naïve Bayes Classifier Using Chi Square and Term Frequency Inverse Document Frequency For Amazon Review Sentiment Analysis
Anisa Falasari ... Much Aziz Muslim
Journal of Soft Computing Exploration | VOL. 3
Anisa Falasari, et. al.Anisa Falasari ... Much Aziz Muslim
30 Mar 2022
Journal of Soft Computing Exploration | VOL. 3

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Improve the Accuracy of Support Vector Machine Using Chi Square Statistic and Term Frequency Inverse Document Frequency on Movie Review Sentiment Analysis

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Scientific Journal of Informatics