Abstract
In India, a country known for its linguistic diversity, code mixing is a common practice, and it has a profound impact on the way people communicate through various mediums, including social media platforms and everyday conversations. The prevalence of code-mixing in social media platforms presents a substantial hurdle for machine translation and language processing tasks. The abundance of unstructured text in code-mixed form on these platforms highlights a crucial research domain within NLP. The blending of Hindi and English, known as Hinglish, and other mixed case text like Malayalam-English, Tamil-English, Telugu- English are particularly prevalent among the younger generation while communication in social media and requires appropriate processing to aid comprehension by both monolingual users and language processing models. Manual translation of this type of data proves to be laborious due to challenges like limited vocabulary, potential misunderstandings of context, grammatical errors, biases, and various other issues. Additionally, existing translation models tend to perform more effectively on monolingual language rather than code-mixed data. Therefore, it is more desirable to build models that can translate code-mixed data.This study tries to convert code-mixed Hinglish, Malayalam-English, Tamil-English, Telugu-English language in Romanised script to monolingual English which can further be given as input to NLP applications like Sentiment Analysis. This is achieved by finetuning pretrained models like IndicLID for Language Identification (LID) module and use an ensemble approach for transliteration + translation using Indictrans and IndicXlit for code mixed machine translation which will be given as input to classification algorithm which performs Sentiment Analysis and predict the sentiment. It is observed that this approach of translation of code-mixed test perform better than traditional machine translation for Indian languages Hindi, Tamil, Telugu and Malayalam.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.