Abstract

A comprehensive review of sentiment analysis for code-mixed and switched text corpus of Indian social media using machine learning (ML) approaches, based on recent research studies has been presented in this paper. Code-mixing and switching are linguistic behavior shown by the bilingual/multilingual population, primarily in spoken but also in written communication, especially on social media. Code-mixing involves combining lower linguistic units like words and phrases of a language into the sentences of other language (the base language) and code-switching involves switching to another language, for the length of one sentence or more. In code-mixing and switching, a bilingual person takes one or more words or phrases from one language and introduces them into another language while communicating in that language in spoken or written mode. People nowadays express their views and opinions on several issues on social media. In multilingual countries, people express their views using English as well as their native languages. Several reasons can be attributed to code-mixing. Lack of knowledge in one language on a particular subject, being empathetic, interjection and clarification are some to name. Sentiment analysis of monolingual social media content has been carried out for the last two decades. However, during recent years, Natural Language Processing (NLP) research focus has also shifted towards the exploration of code-mixed data, thereby, making code mixed sentiment analysis an evolving field of research. Systems have been developed using ML techniques to predict the polarity of code-mixed text corpus and to fine tune the existing models to improve their performance.

Highlights

  • People communicate in their native language or any other natural language having official, national or international status

  • With the advent of Natural Language Processing (NLP) tools and techniques, the research related to the analysis of code- mixed textual data has gained momentum

  • The results show that the most used machine learning (ML) classifier for the sentiment classification of code-mixed Indian language text is Support Vector Machine (SVM) followed by Naïve Bayes (NB) and Random forest (RF)

Read more

Summary

Introduction

People communicate in their native language or any other natural language having official, national or international status. In a bilingual or multilingual community, people use more than one language simultaneously as their medium of communication. These bilingual people often prefer to use mixed language constructions on the internet and social media platforms to communicate with their friends and relatives informally. The utilization of more than one language in a piece of text, whether through code-mixing or switching (or both), for effective communication, is the hallmark of the social media based text-corpus. NLP is the automatic manipulation of natural language text to decipher useful information. As an area of Artificial Intelligence (AI), NLP deals with training a machine for processing the text for human-computer interaction possible in natural languages [1]. The process tends to be just about as straightforward as checking word frequencies to look at changed composing styles or as intricate as understanding total human expressions [2]

Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call