LCCT: A Semi-supervised Model for Sentiment Classification

Min Yang,Kam-Pui Chow,Wenpeng Yin,Ziyu Lu,Wenting Tu

doi:10.3115/v1/n15-1057

Abstract

Analyzing public opinions towards products, services and social events is an important but challenging task. An accurate sentiment analyzer should take both lexicon-level information and corpus-level information into account. It also needs to exploit the domainspecific knowledge and utilize the common knowledge shared across domains. In addition, we want the algorithm being able to deal with missing labels and learning from incomplete sentiment lexicons. This paper presents a LCCT (Lexicon-based and Corpus-based, Co-Training) model for semi-supervised sentiment classification. The proposed method combines the idea of lexicon-based learning and corpus-based learning in a unified cotraining framework. It is capable of incorporating both domain-specific and domainindependent knowledge. Extensive experiments show that it achieves very competitive classification accuracy, even with a small portion of labeled data. Comparing to state-ofthe-art sentiment classification methods, the LCCT approach exhibits significantly better performances on a variety of datasets in both English and Chinese.

Highlights

IntroductionDue to the popularity of opinion-rich resources (e.g., online review sites, forums, blogs and the microblogging websites), people express their opinions all over the Internet
Due to the popularity of opinion-rich resources, people express their opinions all over the Internet
We present a novel semi-supervised sentiment-aware LDA approach to build the lexicon-based classifier, which uses a minimal set of seed words (e.g., “good”,“happy” as positive seeds) as well as document sentiment labels to construct a domain-specific sentiment lexicon

Summary

Introduction

Due to the popularity of opinion-rich resources (e.g., online review sites, forums, blogs and the microblogging websites), people express their opinions all over the Internet. Motivated by the demand of gleaning insights from such valuable data, a flurry of research devotes to the task of extracting people’s opinions from online reviews. Such opinions could be expressed on products, services or policies, etc (Pang and Lee, 2008). The lexicon-based approach counts positive and negative terms in a review based on the sentiment dictionary and classifies the document as positive if it contains more positive terms than negative ones. The corpus-based approach uses supervised learning algorithms to train a sentiment classifier

Objectives

Methods

Results