Abstract

Existing sentiment classifiers usually work for only one specific language, and different classification models are used in different languages. In this paper we aim to build a universal sentiment classifier with a single classification model in multiple different languages. In order to achieve this goal, we propose to learn multilingual sentiment-aware word embeddings simultaneously based only on the labeled reviews in English and unlabeled parallel data available in a few language pairs. It is not required that the parallel data exist between English and any other language, because the sentiment information can be transferred into any language via pivot languages. We present the evaluation results of our universal sentiment classifier in five languages, and the results are very promising even when the parallel data between English and the target languages are not used. Furthermore, the universal single classifier is compared with a few cross-language sentiment classifiers relying on direct parallel data between the source and target languages, and the results show that the performance of our universal sentiment classifier is very promising compared to that of different cross-language classifiers in multiple target languages.

Highlights

  • Nowadays, a large amount of user-generated content (UGC) appears online everyday, such as tweets, comments and product reviews

  • The Bilingual Model (BM) model relies on the direct parallel data between the source and target languages, and it generally works slightly better than the other models, including the PMDB model and the UMM model

  • The results demonstrate that the pivot-driven model is very effective for learning bilingual / trilingual sentiment-aware word embeddings

Read more

Summary

Introduction

A large amount of user-generated content (UGC) appears online everyday, such as tweets, comments and product reviews. Sentiment classification on these data has become a popular research topic over the past few years (Pang et al., 2002; Blitzer et al, 2007; Agarwal et al, 2011; Liu, 2012). Most existing sentiment classifiers rely on labeled training data and the data are usually language-dependent. Labeled training data for sentiment classification are not available or not easy to obtain in many languages in the world (e.g., Malaysian, Mongolian, Uighur). It is hard to build a sentiment classifier in these resource-poor languages

Objectives
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.