Abstract

Cross-lingual sentiment learning is becoming increasingly important due to the multilingual nature of user-generated content on social media and the scarce resources for languages other than English. However, cross-lingual sentiment learning is a challenging task due to the different distribution between translated data and original data and due to the language gap, i.e. each language has its own ways to express sentiments. This work explores the adaptation of English resources for sentiment analysis to a new language, Arabic. The aim is to design a light model for cross-lingual sentiment classification from English to Arabic, without any manual annotation effort which, at the same time, is easy to build and does not require deep linguistic analysis. The ultimate goal is to find an optimal baseline model and to determine the relation between the noise in the translated data and the accuracy of sentiment classification. Different configurations of several factors are investigated including feature representation, feature reduction methods, and the learning algorithms to find the optimal baseline model. Experiments show that a good classification model can be obtained from translated data regardless of the artificial noise added by machine translation. The results also show a significant cost to automation, and thus the best path to future enhancement is through the inclusion of language-specific knowledge and resources.

Highlights

  • Given the quantity and massive popularity of multilingual user-generated content on social media, the need for effective multilingual and cross-lingual sentiment analysis is becoming increasingly important

  • Since we sought to study the ability of machine translation to generate reliable training data which can be employed to perform sentiment analysis for Arabic languages, several experiments were conducted to perform extensive evaluation of different configuration feature representation, feature weighting and classification algorithms

  • The classification accuracies that achieved using these different configurations on Books, DVDs, Electronics, and Kitchen datasets are shown in Tables II, III, IV, and V, respectively using the three mentioned classification methods

Read more

Summary

Introduction

Given the quantity and massive popularity of multilingual user-generated content on social media, the need for effective multilingual and cross-lingual sentiment analysis is becoming increasingly important. The sentiment lexicon and corpus are considered the most valuable resources for the sentiment classification task. Such resources for the world’s languages are rather unbalanced. Because most previous work focuses on English sentiment classification, many annotated sentiment lexica and corpora for English sentiment classification in various domains are freely available on the Web. The annotated resources for sentiment classification in many other languages are not abundant and it is time-consuming to manually label a rich and reliable sentiment lexicon or corpus in those languages [3]-[5]. Efforts towards building sentiment analysis methods for other languages have been hampered by the high cost involved in creating corpora and lexical resources for a new language. We focus on the problem of English-to-Arabic cross-lingual sentiment classification, leveraging only English sentiment resources for sentiment classification of Arabic product reviews, without using any Arabic sentiment resources

Objectives
Methods
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.