Abstract

As the number of non-native English speakers on social media has skyrocketed in recent years, sentiment and emotion analysis on regional languages and code-mixed data has gained traction. Despite extensive research on English, the area of Hindi–English code-mixed texts is still relatively new and understudied. We create an emotion annotated Hindi–English (Hinglish) code-mixed dataset by performing emotion annotation on the benchmark SentiMix dataset to solve this problem and enable future researchers to contribute to this domain. We propose an end-to-end transformer-based multitask framework for sentiment detection and emotion recognition from the SentiMix code-mixed dataset. We fine-tune the pre-trained cross-lingual embedding model, XLMR, using task-specific data to further exploit the efficacy of transfer learning to improve the overall efficiency of our methods. Our proposed multi-task solution outperforms the state-of-the-art single-task and multitask baselines by a considerable margin, implying that the auxiliary task (i.e. emotion recognition) increases the efficiency of the primary task (i.e. sentiment detection) in a multi-task environment. It should be noted that the reported findings were obtained without the use of any ensemble techniques, thereby adhering to a model of effective and production-ready NLP.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call