With the proliferation of informal content on various social media platforms in the form of posts, comments, and feedback, the importance of analyzing text in code-mixed form is gaining importance. Telugu, a low-resource Indian language, has a lot of online content being generated in code-mixed form. However, the lack of large corpora, annotated data and Natural Language Processing (NLP) resources are impeding research on Telugu-English code-mixed data. This paper provides a survey of existing literature on Telugu-English code-mixed text in the areas of resources, POS tagging, Named Entity Recognition, language identification, sentiment analysis, application tasks, dialog systems, and Question-Answering. Various datasets being used by the researchers in the field, along with methods applied to them are detailed. Research gaps are identified to provide future direction for researchers working in this field.
Read full abstract