Named Entity Recognition for Hindi-English Code-Mixed Social Media Text

Vinay Singh,Manish Shrivastava,Deepanshu Vijay,Syed Sarfaraz Akhtar

doi:10.18653/v1/w18-2405

Abstract

Named Entity Recognition (NER) is a major task in the field of Natural Language Processing (NLP), and also is a sub-task of Information Extraction. The challenge of NER for tweets lie in the insufficient information available in a tweet. There has been a significant amount of work done related to entity extraction, but only for resource rich languages and domains such as newswire. Entity extraction is, in general, a challenging task for such an informal text, and code-mixed text further complicates the process with it’s unstructured and incomplete information. We propose experiments with different machine learning classification algorithms with word, character and lexical features. The algorithms we experimented with are Decision tree, Long Short-Term Memory (LSTM), and Conditional Random Field (CRF). In this paper, we present a corpus for NER in Hindi-English Code-Mixed along with extensive experiments on our machine learning models which achieved the best f1-score of 0.95 with both CRF and LSTM.

Highlights

Multilingual speakers often switch back and forth between languages when speaking or writing, mostly in informal settings
Bali et al performed analysis of data from Facebook posts generated by English-Hindi bilingual users
Sharma et al addressed the problem of shallow parsing of Hindi-English code-mixed social media text and developed a system for HindiEnglish code-mixed text that can identify the language of the words, normalize them to their standard forms, assign them their POS tag and segment into chunks

Summary

Introduction

Multilingual speakers often switch back and forth between languages when speaking or writing, mostly in informal settings. Code-mixing refers to the use of linguistic units from different languages in a single utterance or sentence, whereas codeswitching refers to the co-occurrence of speech extracts belonging to two different grammatical systems Gumperz As both phenomena are frequently observed on social media platforms in similar contexts, we use only the code-mixing scenario in this work. Vyas et al formalized the problem, created a POS tag annotated Hindi-English code-mixed corpus and reported the challenges and problems in the HindiEnglish code-mixed text. They performed experiments on language identification, transliteration, normalization and POS tagging of the Dataset. Barman et al addressed the problem of language identification on BengaliHindi-English Facebook comments

Methods

Results

Conclusion

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Named Entity Recognition for Hindi-English Code-Mixed Social Media Text

Abstract

Highlights

Summary

Talk to us

Similar Papers

Lead the way for us

Publication Date: Jan 1, 2018
Citations: 47	License type: cc-by

Similar Papers

A Comparative Study on Various Deep Learning Techniques for Arabic NLP Syntactic Tasks on Noisy Data
Shaima A Abushaala ... Mohammed M Elsheh
-
Shaima A Abushaala, et. al.Shaima A Abushaala ... Mohammed M Elsheh
23 May 2022
23 May 2022

Spam text classification using LSTM Recurrent Neural Network
-
International Journal of Emerging Trends in Engineering Research | VOL. 9
--
08 Sep 2021
International Journal of Emerging Trends in Engineering Research | VOL. 9

Evaluation of clinical named entity recognition methods for Serbian electronic health records
Aleksandar Kaplar ... Aleksandar Kovačević
International Journal of Medical Informatics | VOL. 164
Aleksandar Kaplar, et. al.Aleksandar Kaplar ... Aleksandar Kovačević
25 May 2022
International Journal of Medical Informatics | VOL. 164

A method of Named Entity Recognition in Classical Chinese based on Bert-Ancient-Chinese
Ping Feng ... Jialun Li
-
Ping Feng, et. al.Ping Feng ... Jialun Li
01 Oct 2022
01 Oct 2022

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Named Entity Recognition for Hindi-English Code-Mixed Social Media Text

Abstract

Highlights

Summary

Talk to us

Similar Papers