Bootstrap Domain-Specific Sentiment Classifiers from Unlabeled Corpora

Andrius Mudinas,Mark Levene,Dell Zhang

doi:10.1162/tacl_a_00020

Abstract

There is often the need to perform sentiment classification in a particular domain where no labeled document is available. Although we could make use of a general-purpose off-the-shelf sentiment classifier or a pre-built one for a different domain, the effectiveness would be inferior. In this paper, we explore the possibility of building domain-specific sentiment classifiers with unlabeled documents only. Our investigation indicates that in the word embeddings learned from the unlabeled corpus of a given domain, the distributed word representations (vectors) for opposite sentiments form distinct clusters, though those clusters are not transferable across domains. Exploiting such a clustering structure, we are able to utilize machine learning algorithms to induce a quality domain-specific sentiment lexicon from just a few typical sentiment words (“seeds”). An important finding is that simple linear model based supervised learning algorithms (such as linear SVM) can actually work better than more sophisticated semi-supervised/transductive learning algorithms which represent the state-of-the-art technique for sentiment lexicon induction. The induced lexicon could be applied directly in a lexicon-based method for sentiment classification, but a higher performance could be achieved through a two-phase bootstrapping method which uses the induced lexicon to assign positive/negative sentiment scores to unlabeled documents first, a nd t hen u ses those documents found to have clear sentiment signals as pseudo-labeled examples to train a document sentiment classifier v ia supervised learning algorithms (such as LSTM). On several benchmark datasets for document sentiment classification, our end-to-end pipelined approach which is overall unsupervised (except for a tiny set of seed words) outperforms existing unsupervised approaches and achieves an accuracy comparable to that of fully supervised approaches.

Highlights

Sentiment analysis (Liu, 2015) is a popular research topic which has a wide range of applications, such as summarizing customer reviews, monitoring social media, and predicting stock market trends (Bollen et al, 2011)
How far can we go in sentiment classification for a new domain, given only unlabeled data? This paper presents our exploration towards answering the above research question
We have formulated the cluster hypothesis for sentiment analysis and verified that in general it holds for word embeddings within a specific domain but not across domains

Summary

Introduction

Sentiment analysis (Liu, 2015) is a popular research topic which has a wide range of applications, such as summarizing customer reviews, monitoring social media, and predicting stock market trends (Bollen et al, 2011). There have been some studies on domain adaptation or transfer learning for sentiment classification (Blitzer et al, 2007; Tan et al, 2009; Pan et al, 2010; Glorot et al, 2011; Yoshida et al, 2011; Bollegala et al, 2013; Xia et al, 2013; Yang and Eisenstein, 2015), but they still require a large amount of labeled training data from a fairly similar source domain, which is not always feasible Those algorithms tend to be computational-expensive and time-consuming (Mohammad and Turney, 2010; Fast et al, 2016).

Related Work

Domain-Specific Sentiment Word Embedding

Domain-Specific Sentiment Lexicon Induction

Domain-Specific Sentiment Classification of Documents

Sentiment Classification of Long Texts

Sentiment Classification of Short Texts

Detecting Neutral Sentiment

Findings

Conclusions

Full Text

Published version (

Free)

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Transactions of the Association for Computational Linguistics	Publication Date: Dec 1, 2018
Citations: 63	License type: cc-by

R Discovery Prime

R Discovery Prime

Bootstrap Domain-Specific Sentiment Classifiers from Unlabeled Corpora

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Transactions of the Association for Computational Linguistics

Lead the way for us

Similar Papers

Domain-specific sentiment classification via fusing sentiment knowledge from multiple sources
Fangzhao Wu ... Zhigang Yuan
Information Fusion | VOL. 35
Fangzhao Wu, et. al.Fangzhao Wu ... Zhigang Yuan
05 Sep 2016
Information Fusion | VOL. 35

Imbalanced text sentiment classification using universal and domain-specific knowledge
Yijing Li ... Jianying Yang
Knowledge Based Systems | VOL. 160
Yijing Li, et. al.Yijing Li ... Jianying Yang
05 Jul 2018
Knowledge Based Systems | VOL. 160

Efficient extraction of domain specific sentiment lexicon with active learning
Sungrae Park ... Il-Chul Moon
Pattern recognition letters | VOL. 56
Sungrae Park, et. al.Sungrae Park ... Il-Chul Moon
11 Feb 2015
Pattern recognition letters | VOL. 56

Collaborative Multi-domain Sentiment Classification
Fangzhao Wu ... Yongfeng Huang
-
Fangzhao Wu, et. al.Fangzhao Wu ... Yongfeng Huang
01 Nov 2015
01 Nov 2015

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Bootstrap Domain-Specific Sentiment Classifiers from Unlabeled Corpora

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Transactions of the Association for Computational Linguistics