Machine learning approach towards language identification of Code-Mixed Hindi-English and Urdu-English Social Media Text

Gazi Imtiyaz Ahmad,Jimmy Singla

doi:10.1109/mecon53876.2022.9751958

Abstract

Social media becomes an important and convenient tool to access information that is beneficial in education, marketing, finance and communication. The number of social media users grows significantly with each passing day, resulting in a massive volume of data easily available for Natural Language Processing (NLP) researchers. People especially in multilingual societies prefer to write in multiple languages and use code-mixing and code-switching approaches to express their views, thus making the task of NLP more challenging and complex. Therefore, a language identification system for building complex NLP systems using code-mixed data is an absolute necessity. Although language identification for English and other monolingual languages is a solved problem in many NLP applications, but due to noisy nature of code-mixed text, language identification is a complex task and still an unsolved problem. From the recent past machine learning approaches have gathered significant attention in the field of classification problems. In this paper machine learning approaches using Multinomial Naïve Bayes, Decision Tree and Support Vector Machine have been used for word-level identification of languages in English-Hindi and English-Urdu code-mixed social media text. Support Vector Machine with accuracy of 83.58% and 75.79 % for Hindi-English and Urdu-English respectively performs better than other two approaches.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Machine learning approach towards language identification of Code-Mixed Hindi-English and Urdu-English Social Media Text

Abstract

Talk to us

Similar Papers

Lead the way for us

Similar Papers

Word Level Language Identification of Code Mixing Text in Social Media using NLP
Kasthuri Shanmugalingam ... Sagara Sumathipala
-
Kasthuri Shanmugalingam, et. al.Kasthuri Shanmugalingam ... Sagara Sumathipala
01 Dec 2018
01 Dec 2018

An Effective Bi-LSTM Word Embedding System for Analysis and Identification of Language in Code-Mixed social Media Text in English and Roman Hindi
Shashi Shekhar ... Dilip Kumar Sharma
Computación y Sistemas | VOL. 24
Shashi Shekhar, et. al.Shashi Shekhar ... Dilip Kumar Sharma
09 Dec 2020
Computación y Sistemas | VOL. 24

Experiences of sexual minorities on social media: A study of sentiment analysis and machine learning approaches
Peter Appiahene ... Tao Zhang
Journal of Autonomous Intelligence | VOL. 6
Peter Appiahene, et. al.Peter Appiahene ... Tao Zhang
04 Aug 2023
Journal of Autonomous Intelligence | VOL. 6

Ensemble Machine Learning Approach for Stress Detection in Social Media Texts
Maleeha Illahi
Quaid-e-Awam University Research Journal of Engineering, Science & Technology | VOL. 20
Maleeha IllahiMaleeha Illahi
28 Dec 2022
Quaid-e-Awam University Research Journal of Engineering, Science & Technology | VOL. 20

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Machine learning approach towards language identification of Code-Mixed Hindi-English and Urdu-English Social Media Text

Abstract

Talk to us

Similar Papers