Abstract

This paper aims to analyze user`s emotion automatically by analyzing Twitter, a representative social network service (SNS). In order to create sentiment analysis models by using machine learning techniques, sentiment that represent positive/negative emotions are required. However it is very expensive to obtain sentiment of tweets. So, in this paper, we propose a sentiment analysis model by using self-training technique in order to utilize without sentiment labels as well as with sentiment labels. Self-training technique is that of without sentiment labels is determined by utilizing with sentiment labels, and then updates models using together with with sentiment labels and newly labeled data. This technique improves the sentiment analysis performance gradually. However, it has a problem that misclassifications of unlabeled data in an early stage affect the model updating through the whole learning process because of unlabeled data never changes once those are determined. Thus, of without sentiment labels needs to be carefully determined. In this paper, in order to get high performance using self-training technique, we propose 3 policies for updating with sentiment labels and conduct a comparative analysis. The first policy is to select data of which confidence is higher than a given threshold among newly labeled data. The second policy is to choose the same number of the positive and negative data in the newly labeled data in order to avoid the imbalanced class learning problem. The third policy is to choose newly labeled data less than a given maximum number in order to avoid the updates of large amount of data at a time for gradual model updates. Experiments are conducted using Stanford data set and the data set is classified into positive and negative. As a result, the learned model has a high performance than the learned models by using with sentiment labels only and the self-training with a regular model update policy.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call