Aspect Based Sentiment Analysis: Feature Extraction using Latent Dirichlet Allocation (LDA) and Term Frequency - Inverse Document Frequency (TF-IDF) in Machine Learning (ML)

Shakirah Mohd Sofi,Ali Selamat

doi:10.53840/myjict8-2-102

Abstract

The growth and development of social networks, blogs, forums, and e-commerce websites has produced a number of data, notably textual data, which has increased tremendously. Twitter is one of the most popular media social platforms; during the COVID-19 pandemic, people all around the world use social media to share their opinions or concerns about the pandemic that has changed their lives. It revealed a significant rise in tweets on coronavirus, including positive, negative, and neutral tweets about the virus's impact. Sentiment analysis faces challenges: sparse data limits understanding, while topic coherence and interpretability demand improvement for clearer insights. The primary goal of this paper is to improve the accuracy and effectiveness of sentiment analysis during the COVID-19 pandemic through the application of advanced techniques and classifiers. In this article, we experiment with such Support Vector Machines (SVM) and Naive Bayes (NB) on Twitter data for high-accuracy machine learning models. Using Latent Dirichlet Allocation (LDA)for feature extraction, we aim to capture comprehensive aspects and topics for sentiment analysis. Additionally, we explore Count Vectorizer and Term Frequency - Inverse Document Frequency (TF-IDF) as word embedding techniques. The main objectives are to extract topics, understand public concerns about Covid-19, and compare classifier performance in Aspect-Based Sentiment Analysis on Covid-19 tweets. This paper introduces advanced sentiment analysis techniques, such as LDA, Count Vectorizer, and SVM, enhancing nuanced sentiment analysis during the COVID-19 pandemic with notable 85% accuracy in SVM classification.

Full Text