Abstract

The application of Natural Language Processing (NLP) tasks to the attractive social media corpus is very challenging because social media users often prefer communicating with casual language using out- of-vocabulary (OOV) words and internet abbreviations (Slang). That's why, we have to boost the performance of NLP tasks when applied to social media text. So, we are interested in improving the very major fundamental NLP task, Named Entity Recognition (NER), which assign to each entity a label whether it's a (person, location, organization, etc.) from Twitter. NER will be improved by converting non-standard entities to their canonical form called the Named Entity Normalization (NEN). In this paper, we propose a novel weakly supervised joint approach for named entity recognition and normalization for noisy text. We jointly conduct weakly supervised NER and normalization of both single-token OOV words and multitoken Slang to recognize and restore any type of named entities to their canonical form. This approach can give better results than existing state-of-art NER systems, NEN systems and pipe line approaches.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call