Abstract
This study examines the effectiveness of Naive Bayes and Logistic Regression classifiers in analyzing the sentiment of movie reviews. Two feature extraction approaches, namely Bag of Words (BoW) and Term Frequency-Inverse Document Frequency (TF-IDF), are utilized. We employed a dataset comprising 50,000 IMDB reviews that underwent preprocessing techniques such as denoising, stop word removal, and stemming. The reviews were transformed into vectors using Bag-of-Words (BoW) and Term Frequency-Inverse Document Frequency (TFIDF) approaches. Our investigation demonstrates that Logistic Regression surpasses Naive Bayes in terms of accuracy. Logistic Regression achieves 89.52% accuracy for Bag-of-Words (BoW) and 89.23% accuracy for Term FrequencyInverse Document Frequency (TF-IDF), while Naive Bayes achieves 85.01% accuracy for BoW and 85.74% accuracy for TF-IDF. Naive Bayes has consistent performance with a minimum disparity between training and testing accuracies, indicating strong generalization skills despite its slightly lower accuracy. The results suggest that Logistic Regression outperforms Naive Bayes in terms of accuracy. However, Naive Bayes remains a strong contender because to its simplicity and consistent performance across various feature extraction methods. This comparison offers significant insights for choosing suitable classifiers and feature extraction techniques for text classification problems in sentiment analysis.
Published Version
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have