Abstract

Named Entity Recognition (NER) is a task of extracting entities such as person, location, and organization from texts. NER is more challenging in the social media texts compared to the formal texts due to the noisy language including grammatical errors and abbreviations. However, the problem of NER in the social media gained significant attention in the literature due to the amount of information flow in the social media. In this paper, we propose a comprehensive model for NER in Turkish texts of distinct social media domains, i.e. Twitter, Facebook, and Donanimhaber Forum. The model employs Conditional Random Fields followed by Bidirectional Long Short Term Memory. To overcome the challenges of social media texts, we incorporate word embeddings, character representations, morphology, domain information, pattern-matching, dictionary, part-of-speech, and casing based features to our model. We perform ablation studies to analyze the effect of these features. We demonstrate the success of our model for tagging Turkish social media texts through the largest Turkish NER database.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call