A Comparative Analysis of Feature Selection Algorithms in Cross Domain Sentiment Classification

Siddhi Nath Rajan,Lipika Goel,Neha Nandal,Sonam Gupta,Avdhesh Gupta,Pradeep Gupta

doi:10.2174/0126662558276889240125062857

Abstract

Background: Cross-domain Sentiment Classification is a well-researched field in sentiment analysis. The biggest challenge in CDSC arises from the differences in domains and features, which cause a decrease in model performance when applying source domain features to predict sentiment in the target domain. To address this challenge, several feature selection methods can be employed to identify the most relevant features for training and testing in CDSC. Method: The primary objective of this study is to perform a comparative analysis of different feature selection methods on the various CDSC tasks. In this study, statistical test-based feature selection methods using 18 classifiers for the CDSC task has been implemented. The impact of these feature selection methods on Amazon product reviews, specifically those in the DVD, Electronics, Kitchen, and TV domains, has been compared. Total 12x18 experiments were conducted for each feature selection method by varying source and target domain pairs from the Amazon product reviews dataset and by using 18 classifiers. Performance evaluation measures are accuracy and f-score. Results: From the experiments, it has been inferred that the CSDC task depends on various factors for a good performance, from the right domain selection to the right feature selection method. We have concluded that the best training dataset is Electronics as it gives more precise results while testing in either domain selected for our study. Conclusion: Cross-domain sentiment analysis is a dynamic and interdisciplinary field that offers valuable insights for understanding how sentiment varies across different domains.

Full Text