Abstract

As an increasing number of people embrace social media, mining data generated from the same has become an important task. Possible applications range from opinion mining, sentiment analysis to hate speech detection. More importantly, analyzing code-mixed multilingual text has gained popularity due to the reason that it holds important socio-cultural clues that may be lost in translation. Methods to effectively analyse code-mixed Hindi/English(Hinglish) text have been explored in this paper. Firstly, we generate a large scale code-mixed corpus that would aid in further research of code mixed text on social media. High-quality word embeddings are trained on this code-mixed text. Finally, we demonstrate the efficacy of our proposed method by training machine learning models that improve upon the previous state-of-the-art using a much lighter and explainable architecture. Our main intention behind training the classifier model was not only high performance but also good model explainability and speed.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.