The Impact of Translating Resource-Rich Datasets to Low-Resource Languages Through Multi-Lingual Text Processing

Abdul Ghafoor,Zenun Kastrati,Abdullah Abdullah,Sher Muhammad Daudpota,Rakhi Batra,Mudasir Ahmad Wani,Ali Shariq Imran

doi:10.1109/access.2021.3110285

Abstract

Urdu is still considered a low-resource language despite being ranked as world’s $10^{th}$ most spoken language with nearly 230 million speakers. The scarcity of benchmark datasets in low-resource languages has led researchers to utilize more ingenious techniques to curb the issue. One such option widely adopted is to use language translation services to replicate existing datasets from resource-rich languages such as English to low-resource languages, such as Urdu. For most natural language processing tasks, including polarity assessment, words translated via Google translator from one language to another often change the meaning. It results in a polarity shift causing the system’s performance degradation, particularly for sentiment classification and emotion detection tasks. This study evaluates the effect of translation on the sentiment classification task from a resource-rich language to a low-resource language. It identifies and enlists words causing polarity shift into five distinct categories. It further finds the correlation between the language with similar roots. Our study shows 2-3 percentage points performance degradation in sentiment classification due to polarity shift as a result of translation from resource-rich languages to low-resource languages.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: IEEE Access	Publication Date: Jan 1, 2021
Citations: 16	License type: CC BY-NC-ND 4.0

R Discovery Prime

R Discovery Prime

The Impact of Translating Resource-Rich Datasets to Low-Resource Languages Through Multi-Lingual Text Processing

Abstract

Talk to us

Similar Papers

More From: IEEE Access

Lead the way for us

Similar Papers

Lexicon‐based fine‐tuning of multilingual language models for low‐resource language sentiment analysis
Vinura Dhananjaya ... Surangika Ranathunga
CAAI Transactions on Intelligence Technology | VOL. 9
Vinura Dhananjaya, et. al.Vinura Dhananjaya ... Surangika Ranathunga
01 Apr 2024
CAAI Transactions on Intelligence Technology | VOL. 9

Cross-Lingual Unsupervised Sentiment Classification with Multi-View Transfer Learning
Hongliang Fei ... Ping Li
-
Hongliang Fei, et. al.Hongliang Fei ... Ping Li
01 Jan 2020
01 Jan 2020

Tibetan-Chinese Cross-Lingual Sentiment Classification Based on Co-Training
Tingting Zhang ... Ruikang Shan
-
Tingting Zhang, et. al.Tingting Zhang ... Ruikang Shan
01 Sep 2021
01 Sep 2021

Performance analysis of aspect-level sentiment classification task based on different deep learning models.
Feifei Cao ... Xiaomin Huang
PeerJ. Computer science | VOL. 9
Feifei Cao, et. al.Feifei Cao ... Xiaomin Huang
09 Oct 2023
PeerJ. Computer science | VOL. 9

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

The Impact of Translating Resource-Rich Datasets to Low-Resource Languages Through Multi-Lingual Text Processing

Abstract

Talk to us

Similar Papers

More From: IEEE Access