Domain-specific sentiment classification via fusing sentiment knowledge from multiple sources

Fangzhao Wu,Yongfeng Huang,Zhigang Yuan

doi:10.1016/j.inffus.2016.09.001

Abstract

Analyzing the sentiments in massive user-generated online data, such as product reviews and microblogs, has become a hot research topic. It can help customers, companies and expert systems make more informed decisions. Sentiment analysis is widely known as a domain dependent problem. Different domains usually have different sentiment expressions and a general sentiment classifier is not suitable for all domains. A natural solution to this problem is to train a domain-specific sentiment classifier for each target domain. However, the labeled data in target domain is usually insufficient, and it is costly and time-consuming to annotate enough samples. In order to tackle this problem, we propose a novel approach to train domain-specific sentiment classifiers by fusing the sentiment knowledge from multiple sources. Sentiment information from four sources is extracted and fused in our approach. The first source is sentiment lexicons, which contain sentiment polarities of general sentiment words. The second source is the sentiment classifiers of multiple source domains. The third source is the unlabeled data in target domain, from which we extract domain-specific sentiment relations among words. The fourth source is the labeled data in target domain. We propose a unified framework to fuse these four kinds of sentiment knowledge and train domain-specific sentiment classifier for target domain. In addition, we present an efficient optimization algorithm to solve the model of our approach. Extensive experiments are conducted on both Amazon product review dataset and Twitter dataset. Experimental results show that by fusing the sentiment information extracted from multiple sources, our approach can effectively improve the performance of sentiment classification and reduce the dependence on labeled data. For instance, our approach can achieve an accuracy of 87.22% in Kitchen domain when only 200 samples in target domain are labeled. The performance improvements of our approach compared with purely supervised sentiment classifier are 8.98% and 7.92% on Amazon and Twitter datasets respectively.

Full Text