Sentiment Prediction by a Classifier

Yunfei Ren

doi:10.54254/2755-2721/8/20230060

Abstract

In real life, there is far more unprocessed data than labeled data, which brings a large amount of data that cannot be directly used for machine learning training. Based on the tweet dataset processed by Natural Language Processing (NLP), this paper uses a variety of machine learning models for training and comparison. Moreover, different performances are analyzed and discussed. Since labeled datasets are difficult to obtain, the use of supervised learning will be limited. However, the number of unlabeled datasets is very large, which can provide a continuous training set for machine learning. This paper conducted a comparative experiment on the effect of semi-supervised learning and obtained better results than supervised learning and unsupervised learning. The experiments in this paper prove that semi-supervised learning can effectively use unlabeled data and train machine learning models.

Full Text