Cross Lingual Sentiment Analysis: A Clustering-Based Bee Colony Instance Selection and Target-Based Feature Weighting Approach

Mohammed Abbas Mohammed Almansor,Abir Hussain,Wasiq Khan,Naji Alhusaini,Chongfu Zhang

doi:10.3390/s20185276

Abstract

The lack of sentiment resources in poor resource languages poses challenges for the sentiment analysis in which machine learning is involved. Cross-lingual and semi-supervised learning approaches have been deployed to represent the most common ways that can overcome this issue. However, performance of the existing methods degrades due to the poor quality of translated resources, data sparseness and more specifically, language divergence. An integrated learning model that uses a semi-supervised and an ensembled model while utilizing the available sentiment resources to tackle language divergence related issues is proposed. Additionally, to reduce the impact of translation errors and handle instance selection problem, we propose a clustering-based bee-colony-sample selection method for the optimal selection of most distinguishing features representing the target data. To evaluate the proposed model, various experiments are conducted employing an English-Arabic cross-lingual data set. Simulations results demonstrate that the proposed model outperforms the baseline approaches in terms of classification performances. Furthermore, the statistical outcomes indicate the advantages of the proposed training data sampling and target-based feature selection to reduce the negative effect of translation errors. These results highlight the fact that the proposed approach achieves a performance that is close to in-language supervised models.

Highlights

With the development of Web 3.0 era, artificial intelligence (AI), increasing amount of multi-lingual user-generated content are available that expresses the users’ views, feedback or comments concerning various aspects such as products quality, services, and government policies
Normalize Yp Repeat the above steps n times from step 3 to build n trained semi-supervised models Each trained with different feature set; Step (2) The n Semi-Supervised classifier vote to determine the final labels for the unlabeled data Yp
The experimental results using LR, NB, Maximum Entropy (ME), classifiers voting ensemble on Books B, DVDs D, Electronics E, and Kitchen Appliances K are summarized in Table 2 and Figure 2

Summary

Introduction

With the development of Web 3.0 era, artificial intelligence (AI), increasing amount of multi-lingual user-generated content are available that expresses the users’ views, feedback or comments concerning various aspects such as products quality, services, and government policies. In order to overcome the annotation cost, various solutions have been proposed in the literature to exploit the unlabeled data in target-language (this is called semi-supervised learning) [1], or to explore translated models and/or data available in other languages (this is called transfer learning) [3,4,5,9] The lack of these annotated resources in the majority of languages motivated research toward cross-lingual approaches for sentiment analysis. SCLL techniques attempt to make use of current annotated sentiment resources from opulent language domain (i.e., genre or/and different topics) These approaches employ machine translation (from target to source languages, or from source to target, which are referred to as bidirectional), bilingual lexicons or cross-lingual representation learning techniques with parallel corpora to project the labeled data from source to targeted language [1,3,9,10].

Related Studies

Challenges of CROSS-LINGUAL Sentiment Analysis

Main Techniques for Instance Selection and Feature Weighting

The Proposed Method

Clustering Based on BEE-COLONY Training Instance Selection

Clustering Target Language Data

Improved Artificial BEE-COLONY Training Selection

Target-Based Feature Selection Methods

Ensemble Supervised Learning

Integrating Prior Supervised Information with Semi-Supervised

SEMI-SUPERVISED Learning

Experimental Design

Result and Discussion

Findings

Performance

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Sensors	Publication Date: Sep 15, 2020
Citations: 5	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

Cross Lingual Sentiment Analysis: A Clustering-Based Bee Colony Instance Selection and Target-Based Feature Weighting Approach

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Sensors

Lead the way for us

Similar Papers

A Survey of Cross-lingual Sentiment Analysis: Methodologies, Models and Evaluations
Yuemei Xu ... Wenqing Wang
Data Science and Engineering | VOL. 7
Yuemei Xu, et. al.Yuemei Xu ... Wenqing Wang
08 Jun 2022
Data Science and Engineering | VOL. 7

A comparative study of cross-lingual sentiment analysis
Pavel Přibáň ... Adam Mištera
Expert Systems with Applications | VOL. 247
Pavel Přibáň, et. al.Pavel Přibáň ... Adam Mištera
18 Jan 2024
Expert Systems with Applications | VOL. 247

Semi-supervised Learning on Cross-Lingual Sentiment Analysis with Space Transfer
Xiaonan He ... Hui Zhang
-
Xiaonan He, et. al.Xiaonan He ... Hui Zhang
01 Mar 2015
01 Mar 2015

Integrating Feature and Instance Selection Techniques in Opinion Mining
Zi-Hung You ... Ya-Han Hu
-
Zi-Hung You, et. al.Zi-Hung You ... Ya-Han Hu
10 Jun 2022
10 Jun 2022

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Cross Lingual Sentiment Analysis: A Clustering-Based Bee Colony Instance Selection and Target-Based Feature Weighting Approach

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Sensors