Sentiment analysis based on machine learning models

Xinya Liao

doi:10.54254/2755-2721/54/20241434

Abstract

Sentiment analysis represents a pivotal research domain within the realm of natural language processing (NLP). Its significance lies in its capacity to scrutinize vast volumes of data originating from social networks and to offer invaluable insights. While numerous studies center on the exploration and enhancement of diverse models and techniques for sentiment analysis tasks, there is a scarcity of research dedicated to evaluating and contrasting the performance of these models. This paper undertakes an investigation to assess the efficacy of four distinct machine learning models: k-nearest neighbor (KNN), random forest, multinomial naive Bayes, and logistic regression, with the aim of shedding light on their relative effectiveness. The data in this research comes from two datasets, SST-2 and IMDB. Data from SST-2 is used for training and testing, and data from IMDB is used for further testing. The term frequency-inverse document frequency (TF-IDF) feature extraction method is integrated with the models and applied to the datasets. Results show that all the four models do well on SST-2 dataset, but KNN and random forest model perform poorly on IMDB dataset.

Full Text