Code-Mixing: A Brief Survey

S Thara,Prabaharan Poornachandran

doi:10.1109/icacci.2018.8554413

Abstract

Indians and many other non-English speakers across the world, prefer not to use single code in their messaging texts on social media platforms. They make use of transliteration and randomly merged English words using code-mixing, two or more languages to show their linguistic proficiency (English-Spanish, Arabic-English, etc.). Code-mixing (CM) is a dynamically progressive area of research in the domain of text mining. Present time communications in social media, blogs, reviews are abuzz with creative, crafty code-mixed messages. This paper highlights a comprehensive study of CM in the diverse fields of Natural Language Processing (NLP) including language identification, Part-of-Speech (POS) tagging, Named Entity Recognition (NER), Polarity Identification, Question Answering. CM has also been sought after in studies involving Machine Translation, Dialect identification, Speech technologies etc. Most of the applications of code mixing are scrutinized and presented briefly in this survey. This study purports to articulate tends and, techniques pursued in languages used and also unique evaluation measures to give accuracy.

Full Text