Sentiment Analysis of Code-Mixed Telugu-English Data Leveraging Syllable and Word Embeddings

Upendar Rao Rayala,Karthick Seshadri,Nagesh Bhattu Sristy

doi:10.1145/3620670

Abstract

Learning the inherent meaning of a word in Natural Language Processing (NLP) has motivated researchers to represent a word at various levels of abstraction, namely character-level, morpheme-level, and subword-level vector representations. Syllable-Aware Word Embedding (SAWE) can effectively handle agglutinative and fusion-based NLP tasks. However, research attempts on assessing the SAWE on such extrinsic NLP tasks has been scanty, especially for low-resource languages in the context of code-mixing with English. A model to learn SAWE to extract semantics at fine-grained subunits of a word is proposed in this article, and the representative ability of the embeddings is assessed through sentiment analysis of code-mixed Telugu-English review corpora. Multilingual societies and advancements in communication technologies have accounted for the prolific usage of mixed data, which renders the State-of-the-Art (SOTA) sentiment analysis models developed based on monolingual data ineffective. Social media users in the Indian subcontinent exhibit a tendency to mix English and their respective native language (using the phonetic form of English) in expressing their opinions or sentiments. A code-mixing scenario provides flexibility to borrow words from a foreign language, usage of shorthand notations, elongation of vowels, and usage of words without following syntactic/grammatical rules, which renders the sentiment analysis of code-mixed data challenging to perform. Deep neural architectures like Long Short-Term Memory and Gated Recurrent Unit networks have been shown to be effective in solving several NLP tasks, such as sequence labeling, named entity recognition, and machine translation. In this article, a framework to perform sentiment analysis on a code-mixed Telugu-English review corpus is implemented. Both word embedding and SAWE are input to a unified deep neural network that contains a two-level Bidirectional Long Short-Term Memory/Gated Recurrent Unit network with Softmax as the output layer. The proposed model leverages the advantages of both word embedding and SAWE, which enable the proposed model to outperform existing SOTA code-mixed sentiment analysis models on a Telugu-English code-mixed dataset of the International Institute of Information Technology–Hyderabad and a dataset curated by the authors. The improvement realized by the proposed model on these datasets is [3% increase in F1-score and 2% increase in accuracy] and [7% increase in F1-score and 5% in accuracy], respectively, in comparison with the best-performing SOTA model.

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Sentiment Analysis of Code-Mixed Telugu-English Data Leveraging Syllable and Word Embeddings

Abstract

Talk to us

Similar Papers

More From: ACM Transactions on Asian and Low-Resource Language Information Processing

Lead the way for us

Journal: ACM Transactions on Asian and Low-Resource Language Information Processing	Publication Date: Oct 13, 2023
Citations: 2

Similar Papers

Word Embedding for Bengali Language using Domain-related Corpus
Ashutosh Bandyopadhyay ... Jayashree Nair
-
Ashutosh Bandyopadhyay, et. al.Ashutosh Bandyopadhyay ... Jayashree Nair
26 Apr 2023
26 Apr 2023

Word Embeddings for Natural Language Processing

-

01 Jan 2015
01 Jan 2015

HauWE: Hausa Words Embedding for Natural Language Processing
Idris Abdulmumin ... Bashir Shehu Galadanci
-
Idris Abdulmumin, et. al.Idris Abdulmumin ... Bashir Shehu Galadanci
01 Oct 2019
01 Oct 2019

Multi-Task Text Classification using Graph Convolutional Networks for Large-Scale Low Resource Language
Mounika Marreddy ... Subba Reddy Oota
-
Mounika Marreddy, et. al.Mounika Marreddy ... Subba Reddy Oota
18 Jul 2022
18 Jul 2022

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Sentiment Analysis of Code-Mixed Telugu-English Data Leveraging Syllable and Word Embeddings

Abstract

Talk to us

Similar Papers

More From: ACM Transactions on Asian and Low-Resource Language Information Processing