Deep Learning-Based Language Identification in English-Hindi-Bengali Code-Mixed Social Media Corpora

Anupam Jamatia,Björn Gambäck,Amitava Das

doi:10.1515/jisys-2017-0440

Abstract

Abstract This article addresses language identification at the word level in Indian social media corpora taken from Facebook, Twitter and WhatsApp posts that exhibit code-mixing between English-Hindi, English-Bengali, as well as a blend of both language pairs. Code-mixing is a fusion of multiple languages previously mainly associated with spoken language, but which social media users also deploy when communicating in ways that tend to be rather casual. The coarse nature of code-mixed social media text makes language identification challenging. Here, the performance of deep learning on this task is compared to feature-based learning, with two Recursive Neural Network techniques, Long Short Term Memory (LSTM) and bidirectional LSTM, being contrasted to a Conditional Random Fields (CRF) classifier. The results show the deep learners outscoring the CRF, with the bidirectional LSTM demonstrating the best language identification performance.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Deep Learning-Based Language Identification in English-Hindi-Bengali Code-Mixed Social Media Corpora

Abstract

Talk to us

Similar Papers

More From: Journal of Intelligent Systems

Lead the way for us

Journal: Journal of Intelligent Systems	Publication Date: Mar 13, 2018
Citations: 19

Similar Papers

Gujarati Task Oriented Dialogue Slot Tagging Using Deep Neural Network Models
Rachana Parikh ... Hiren Joshi
-
Rachana Parikh, et. al.Rachana Parikh ... Hiren Joshi
01 Jan 2020
01 Jan 2020

Towards audio-based identification of Ethio-Semitic languages using recurrent neural network
Amlakie Aschale Alemu ... Ayodeji Olalekan Salau
Scientific Reports | VOL. 13
Amlakie Aschale Alemu, et. al.Amlakie Aschale Alemu ... Ayodeji Olalekan Salau
07 Nov 2023
Scientific Reports | VOL. 13

Language Identification in Mixed Script
Nagesh Bhattu Sristy ... Vadlamani Ravi
-
Nagesh Bhattu Sristy, et. al.Nagesh Bhattu Sristy ... Vadlamani Ravi
08 Dec 2017
08 Dec 2017

A Comparative Study on Various Deep Learning Techniques for Arabic NLP Syntactic Tasks on Noisy Data
Shaima A Abushaala ... Mohammed M Elsheh
-
Shaima A Abushaala, et. al.Shaima A Abushaala ... Mohammed M Elsheh
23 May 2022
23 May 2022

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Deep Learning-Based Language Identification in English-Hindi-Bengali Code-Mixed Social Media Corpora

Abstract

Talk to us

Similar Papers

More From: Journal of Intelligent Systems