Comparison of the Effects of Feature Selection and Tree-Based Ensemble Machine Learning for Sentiment Analysis on Indonesian YouTube Comments

Siti Khomsah,Ahmad Fathan Hidayatullah,Agus Sasmito Aribowo

doi:10.1007/978-981-33-6926-9_15

Abstract

The main problems in sentiment analysis models on Indonesian YouTube comments are unstructured data and low classification accuracy. Sentiment analysis for Indonesian, which is different from English, requires proper preprocessing and classification methods. Previous research usually using Linear Support Vector Machine (SVM), Naive Bayes and Decision Tree. Although the accuracy of SVM is better than other algorithms, it still needs to be improved. This study aims to compare the performance of the tree-based ensemble method and feature selection to improve the sentiment analysis model for Indonesian YouTube comments. This research crawled Indonesian YouTube comments from different domains and produce ten datasets. The preprocessing’s method in this research was removed stopword, convert slang words, and stemming. For feature selection, we tested two vectorizer method, i.e. Term Frequency (TF) or Term Frequency/Inverse Document Frequency (TF-IDF). The model build using six machine learning, consist of four tree-based ensemble machine learning to raise better accuracy, Linear SVM and Decision Tree. We use tree-based ensemble machine learning, they are Random Forest, and Extra Tree represents bagging ensemble. AdaBoost and Gradient Boosting represent boosting ensemble. SVM and Decision tree as a comparison. Based on experiments by combining feature selection and ensemble machine learning, it can be concluded that the type of vectorizer has little effect on classification accuracy. In all experiments, the best machine learning methods are Extra Tree with an accuracy of 93.39% and AdaBoost with an accuracy of 92.53%. Whereas, the use of TF or TF-IDF does not significantly affect accuracy.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Comparison of the Effects of Feature Selection and Tree-Based Ensemble Machine Learning for Sentiment Analysis on Indonesian YouTube Comments

Abstract

Talk to us

Similar Papers

Lead the way for us

Similar Papers

Improve the Accuracy of Support Vector Machine Using Chi Square Statistic and Term Frequency Inverse Document Frequency on Movie Review Sentiment Analysis
Ukhti Ikhsani Larasati ... Much Aziz Muslim
Scientific Journal of Informatics | VOL. 6
Ukhti Ikhsani Larasati, et. al.Ukhti Ikhsani Larasati ... Much Aziz Muslim
24 May 2019
Scientific Journal of Informatics | VOL. 6

Cross-domain sentiment analysis model on Indonesian YouTube comment
Agus Sasmito Aribowo ... Halizah Basiron
International Journal of Advances in Intelligent Informatics | VOL. 7
Agus Sasmito Aribowo, et. al.Agus Sasmito Aribowo ... Halizah Basiron
31 Mar 2021
International Journal of Advances in Intelligent Informatics | VOL. 7

An efficient approach for improving customer Sentiment Analysis in the Arabic language using an Ensemble machine learning technique
Nouri Hicham ... Nassera Habbat
-
Nouri Hicham, et. al.Nouri Hicham ... Nassera Habbat
12 Dec 2022
12 Dec 2022

Efficient feature selection techniques for sentiment analysis
Avinash Madasu ... Sivasankar Elango
Multimedia Tools and Applications | VOL. 79
Avinash Madasu, et. al.Avinash Madasu ... Sivasankar Elango
14 Dec 2019
Multimedia Tools and Applications | VOL. 79

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Comparison of the Effects of Feature Selection and Tree-Based Ensemble Machine Learning for Sentiment Analysis on Indonesian YouTube Comments

Abstract

Talk to us

Similar Papers