Abstract

With the large volume of text available online it is becoming impractical to use supervised machine learning methods that require a sizeable training set of labelled data. In this paper we introduced a new sentiment-topic model called the hybrid sentiment-topic model (HST). The HST model is a completely unsupervised sentiment classification method that allows for the topical context of words in documents to be accounted for when classifying sentiment. The only input needed for the model is a list of positive seed words, a list of negative seed words, and the number of topics. The HST model differs from similar models as it ensures that each objective topic discovered has both a positive sentiment-topic and negative sentiment-topic associated with it; other similar models do not guarantee symmetric sentiment-topics. The HST model performs three functions, firstly, it discovers objective topics in a corpus of text; secondly, it finds a positive and negative sentiment-topic for each objective topic; and finally, it performs sentiment classification. The HST model is tested using a dataset consisting of movie reviews and a dataset of social media posts. For each dataset a variety of seed word lists and different numbers of topics are tested; the HST model is then compared against similar sentiment-topic models. In all experiments conducted, the HST model was found to outperform similar sentiment-topic models in terms of classification accuracy by a noticeable margin. Additionally, the HST model was found to converge faster than similar models and the accuracy was found to be more stable during the Gibbs sampling process.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.