A Comparative Study of Feature Selection and Machine Learning Algorithms for Arabic Sentiment Classification

Nazlia Omar,Mohammed Albared,Tareq Al-Moslmi,Adel Al-Shabi

doi:10.1007/978-3-319-12844-3_37

Abstract

AbstractSentiment analysis is a very challenging and important task that involves natural language processing, web mining, and machine learning. Sentiment analysis in the Arabic language is a more challenging task than in other languages due to the morphological complexity of the Arabic and the large variation of its dialects. This paper presents an empirical comparison of seven feature selection methods (Information Gain, Principal Components Analysis, Relief-F, Gini Index, Uncertainty, Chi-squared, and Support Vector Machines (SVMs)), and three machine learning classifiers (SVM, Naive Bayes, and K-nearest neighbor) for Arabic sentiment classification. A wide range of comparative experiments are conducted on an opinion corpus for Arabic (OCA). This paper demonstrates that feature selection does improve the performance of Arabic sentiment-based classification, but the result depends on the method used and the number of features selected. The experimental results demonstrate that feature reduction methods are found to improve the classifier performance. Moreover, the experimental results indicate that SVM-based feature selection yields the best performance for feature selection and that the SVM classifier outperforms the other techniques for Arabic sentiment-based classification. Finally, the experiments indicate that the SVM classifier with the SVM-based feature selection method yields the best classification method, with an accuracy of 92.4%.KeywordsArabic Sentiment AnalysisOpinion MiningMachine LearningFeature Selection

Full Text