Zyy1510 Team at SemEval-2020 Task 9: Sentiment Analysis for Code-Mixed Social Media Text with Sub-word Level Representations

Yueying Zhu,Kunjie Dong,Hongling Li,Xiaobing Zhou

doi:10.18653/v1/2020.semeval-1.183

Abstract

This paper reports the zyy1510 team’s work in the International Workshop on Semantic Evaluation (SemEval-2020) shared task on Sentiment analysis for Code-Mixed (Hindi-English, English-Spanish) Social Media Text. The purpose of this task is to determine the polarity of the text, dividing it into one of the three labels positive, negative and neutral. To achieve this goal, we propose an ensemble model of word n-grams-based Multinomial Naive Bayes (MNB) and sub-word level representations in LSTM (Sub-word LSTM) to identify the sentiments of code-mixed data of Hindi-English and English-Spanish. This ensemble model combines the advantage of rich sequential patterns and the intermediate features after convolution from the LSTM model, and the polarity of keywords from the MNB model to obtain the final sentiment score. We have tested our system on Hindi-English and English-Spanish code-mixed social media data sets released for the task. Our model achieves the F1 score of 0.647 in the Hindi-English task and 0.682 in the English-Spanish task, respectively.

Highlights

Mixing language, known as code-mixing, is a norm in multilingual societies
Social media code-mixed texts generally have three forms: i) Mixed script: a combination of the native-Roman script; ii) Code-Mixed script: a script written in Roman script in native and English languages; iii) Native script: local languages written in native languages
Beyond some of the challenges of general sentiment analysis, code-mixed texts have some unseen difficulties in natural language processing (NLP) tasks

Summary

Introduction

Known as code-mixing, is a norm in multilingual societies. Many multilingual people tend to be code-mixed by using English-based speech types and the insertion of English into their main language (Patwa et al, 2020), which share their views on social media by combining local and English languages, creating lots of code-mixed text such as Hindi-English and English-Spanish (Ramanarayanan and Suendermann-Oeft, 2017). Social media code-mixed texts generally have three forms: i) Mixed script: a combination of the native-Roman script; ii) Code-Mixed script: a script written in Roman script in native and English languages; iii) Native script: local languages written in native languages. This type of text needs to be handled differently, which is very different from traditional English texts (Prabhu et al, 2016). Beyond some of the challenges of general sentiment analysis, code-mixed texts have some unseen difficulties in natural language processing (NLP) tasks. The implementation of our system is made available via Github

Related Work

Dataset

System Description

Experiments detail

Findings

Conclusion and Future Work

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Zyy1510 Team at SemEval-2020 Task 9: Sentiment Analysis for Code-Mixed Social Media Text with Sub-word Level Representations

Abstract

Highlights

Summary

Talk to us

Similar Papers

Lead the way for us

Publication Date: Jan 1, 2020
Citations: 3	License type: cc-by

Similar Papers

Improving Code-mixed POS Tagging Using Code-mixed Embeddings
S Nagesh Bhattu ... D V L N Somayajulu
ACM Transactions on Asian and Low-Resource Language Information Processing | VOL. 19
S Nagesh Bhattu, et. al.S Nagesh Bhattu ... D V L N Somayajulu
29 Mar 2020
ACM Transactions on Asian and Low-Resource Language Information Processing | VOL. 19

An Effective Bi-LSTM Word Embedding System for Analysis and Identification of Language in Code-Mixed social Media Text in English and Roman Hindi
Shashi Shekhar ... M.M Sufyan Beg
Computación y Sistemas | VOL. 24
Shashi Shekhar, et. al.Shashi Shekhar ... M.M Sufyan Beg
09 Dec 2020
Computación y Sistemas | VOL. 24

Resource Creation for Training and Testing of Normalisation Systems for Konkani-English Code-Mixed Social Media Text
Akshata Phadte
-
Akshata PhadteAkshata Phadte
01 Jan 2018
01 Jan 2018

A Dataset of Hindi-English Code-Mixed Social Media Text for Hate Speech Detection
Aditya Bohra ... Manish Shrivastava
-
Aditya Bohra, et. al.Aditya Bohra ... Manish Shrivastava
01 Jan 2018
01 Jan 2018

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Zyy1510 Team at SemEval-2020 Task 9: Sentiment Analysis for Code-Mixed Social Media Text with Sub-word Level Representations

Abstract

Highlights

Summary

Talk to us

Similar Papers