Abstract
With the proliferation of informal content on various social media platforms in the form of posts, comments, and feedback, the importance of analyzing text in code-mixed form is gaining importance. Telugu, a low-resource Indian language, has a lot of online content being generated in code-mixed form. However, the lack of large corpora, annotated data and Natural Language Processing (NLP) resources are impeding research on Telugu-English code-mixed data. This paper provides a survey of existing literature on Telugu-English code-mixed text in the areas of resources, POS tagging, Named Entity Recognition, language identification, sentiment analysis, application tasks, dialog systems, and Question-Answering. Various datasets being used by the researchers in the field, along with methods applied to them are detailed. Research gaps are identified to provide future direction for researchers working in this field.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
More From: ACM Transactions on Asian and Low-Resource Language Information Processing
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.