Unsupervised Learning Chinese Sentiment Lexicon from Massive Microblog Data

Shi Feng,Weili Xu,Lin Wang,Daling Wang,Ge Yu

doi:10.1007/978-3-642-35527-1_3

Abstract

AbstractAnalyzing people’s feelings and emotions in social media has become a major concern for both academic researchers and commercial companies. The sentiment lexicon plays a crucial role in the most sentiment analysis applications. However, existing thesaurus based lexicon building methods suffer from the coverage problems when faced with the new words and new meanings in social media. Nowadays, millions of users share their opinions on different aspects of life everyday in microblogs. In this paper, a novel method based on occurrence probability with emoticons is presented to learn the candidate sentiment words from the massive microblog data and the accuracy of the learned lexicon is further improved by using the whole microblog space as the corpus. Extensive experiments were conducted on real world datasets with different topics. The results show that the proposed method is able to extract the emerging words, and learned lexicon outperforms two well-known Chinese lexicons in classifying the sentiments in microblogs.KeywordsSentiment AnalysisNegative WordLearn LexiconSentiment CategorySentiment LexiconThese keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Full Text