Social media platforms like Facebook, Twitter, Instagram, and YouTube have become central to communication and entertainment, with users sharing opinions on various topics. These opinions, often categorized as positive, negative, or neutral sentiments, provide valuable data for sentiment analysis. Our research analyzed political YouTube comments related to India’s Bhartiya Janata Party (BJP) and Indian National Congress (INC) using a combination of the AFINN lexicon and machine learning techniques. We applied feature representation methods such as Bag of Words (BoW) and Term Frequency-Inverse Document Frequency (TF-IDF), alongside five machine learning algorithms: Multinomial Naïve Bayes, Logistic Regression, Random Forest, Support Vector Machine (SVM), and K-nearest neighbor (K-NN). We aimed to determine the most efficient sentiment analysis approach by comparing the performance of these models using standard evaluation metrics. For the BJP dataset, Logistic Regression performed best with BoW, while SVM was most effective with TF-IDF. Similarly, for the INC dataset, Random Forest excelled with BoW, and SVM outperformed others with TF-IDF. The AFINN lexicon showed poor performance across both datasets, and K-NN consistently achieved lower accuracy. Our findings suggest that SVM and Random Forest are more suitable for political sentiment analysis.
Read full abstract