Abstract

The lack of sentiment resources in poor resource languages poses challenges for the sentiment analysis in which machine learning is involved. Cross-lingual and semi-supervised learning approaches have been deployed to represent the most common ways that can overcome this issue. However, performance of the existing methods degrades due to the poor quality of translated resources, data sparseness and more specifically, language divergence. An integrated learning model that uses a semi-supervised and an ensembled model while utilizing the available sentiment resources to tackle language divergence related issues is proposed. Additionally, to reduce the impact of translation errors and handle instance selection problem, we propose a clustering-based bee-colony-sample selection method for the optimal selection of most distinguishing features representing the target data. To evaluate the proposed model, various experiments are conducted employing an English-Arabic cross-lingual data set. Simulations results demonstrate that the proposed model outperforms the baseline approaches in terms of classification performances. Furthermore, the statistical outcomes indicate the advantages of the proposed training data sampling and target-based feature selection to reduce the negative effect of translation errors. These results highlight the fact that the proposed approach achieves a performance that is close to in-language supervised models.

Highlights

  • With the development of Web 3.0 era, artificial intelligence (AI), increasing amount of multi-lingual user-generated content are available that expresses the users’ views, feedback or comments concerning various aspects such as products quality, services, and government policies

  • Normalize Yp Repeat the above steps n times from step 3 to build n trained semi-supervised models Each trained with different feature set; Step (2) The n Semi-Supervised classifier vote to determine the final labels for the unlabeled data Yp

  • The experimental results using LR, NB, Maximum Entropy (ME), classifiers voting ensemble on Books B, DVDs D, Electronics E, and Kitchen Appliances K are summarized in Table 2 and Figure 2

Read more

Summary

Introduction

With the development of Web 3.0 era, artificial intelligence (AI), increasing amount of multi-lingual user-generated content are available that expresses the users’ views, feedback or comments concerning various aspects such as products quality, services, and government policies. In order to overcome the annotation cost, various solutions have been proposed in the literature to exploit the unlabeled data in target-language (this is called semi-supervised learning) [1], or to explore translated models and/or data available in other languages (this is called transfer learning) [3,4,5,9] The lack of these annotated resources in the majority of languages motivated research toward cross-lingual approaches for sentiment analysis. SCLL techniques attempt to make use of current annotated sentiment resources from opulent language domain (i.e., genre or/and different topics) These approaches employ machine translation (from target to source languages, or from source to target, which are referred to as bidirectional), bilingual lexicons or cross-lingual representation learning techniques with parallel corpora to project the labeled data from source to targeted language [1,3,9,10].

Related Studies
Challenges of CROSS-LINGUAL Sentiment Analysis
Main Techniques for Instance Selection and Feature Weighting
The Proposed Method
Clustering Based on BEE-COLONY Training Instance Selection
Clustering Target Language Data
Improved Artificial BEE-COLONY Training Selection
Target-Based Feature Selection Methods
Ensemble Supervised Learning
Integrating Prior Supervised Information with Semi-Supervised
SEMI-SUPERVISED Learning
Experimental Design
Result and Discussion
Findings
Performance

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.