Abstract

Code-mixing on social media is a trend in many countries where people speak multiple languages, such as India, where Hindi and English are major communication languages. Sentiment analysis is beneficial in understanding users’ opinions and thoughts on social, economic, and political issues. It eliminates the manual monitoring of each and every review, which is a cumbersome task. However, performing sentiment analysis on code-mix data is challenging, as it involves various out of vocabulary terms and numerous issues, making it a new field in natural language processing. This work includes dealing with such text and ensembling a classifier to detect sentiment polarity. Our classifier ensembles a multilingual variant of RoBERTa and a sentence-level embedding from Universal Sentence Encoder to identify the sentiments of these code-mixed tweets with higher accuracy. This ensemble optimises the classifier's performance by using the strength of both for transfer learning. Experiments were conducted on real-life benchmark datasets and revealed their sentiment. The performance of the proposed classifier framework is compared with other baselines and deep learning models on five datasets to show the superiority of our results. Results showed improved and increased performance in the proposed classifier's accuracy, precision, and recall. The accuracy achieved by our classifier on code-mix datasets is 66% on Joshi et al. 2016, 60% on SAIL 2017, and 67% on SemEval 2020 Task-9 dataset, which is on average around 3% as compared to contemporary baselines.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.