ParsNER-Social: A Corpus for Named Entity Recognition in Persian Social Media Texts

Majid Asgari-Bidhendi ,Behrooz Janfada ,O R Roshani Talab ,Behrouz Minaei-Bidgoli

doi:10.22044/jadm.2020.9949.2143

Abstract

Named Entity Recognition (NER) is one of the essential prerequisites for many natural language processing tasks. All public corpora for Persian named entity recognition, such as ParsNERCorp and ArmanPersoNERCorpus, are based on the Bijankhan corpus, which is originated from the Hamshahri newspaper in 2004. Correspondingly, most of the published named entity recognition models in Persian are specially tuned for the news data and are not flexible enough to be applied in different text categories, such as social media texts. This study introduces ParsNER-Social, a corpus for training named entity recognition models in the Persian language built from social media sources. This corpus consists of 205,373 tokens and their NER tags, crawled from social media contents, including 10 Telegram channels in 10 different categories. Furthermore, three supervised methods are introduced and trained based on the ParsNER-Social corpus: Two conditional random field models as baseline models and one state-of-the-art deep learning model with six different configurations are evaluated on the proposed dataset. The experiments show that the Mono-Lingual Persian models based on Bidirectional Encoder Representations from Transformers (MLBERT) outperform the other approaches on the ParsNER-Social corpus. Among different Configurations of MLBERT models, the ParsBERT+BERT-TokenClass model obtained an F1-score of 89.65%.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

ParsNER-Social: A Corpus for Named Entity Recognition in Persian Social Media Texts

Abstract

Talk to us

Similar Papers

More From: Journal of AI and Data Mining

Lead the way for us

Journal: Journal of AI and Data Mining	Publication Date: Apr 1, 2021
Citations: 2

Similar Papers

Enhancing the Performance of Telugu Named Entity Recognition Using Gazetteer Features
Saikiranmai Gorla ... Aruna Malapati
Information | VOL. 11
Saikiranmai Gorla, et. al.Saikiranmai Gorla ... Aruna Malapati
02 Feb 2020
Information | VOL. 11

Social Media Named Entity Recognition Based On Graph Attention Network
Wei Zhang ... Jianying Luo
-
Wei Zhang, et. al.Wei Zhang ... Jianying Luo
01 Nov 2021
01 Nov 2021

Automatic Extraction of Comprehensive Drug Safety Information from Adverse Drug Event Narratives in the Korea Adverse Event Reporting System Using Natural Language Processing Techniques.
Siun Kim ... Yesol Hong
Drug Safety | VOL. 46
Siun Kim, et. al.Siun Kim ... Yesol Hong
17 Jun 2023
Drug Safety | VOL. 46

Identifying Electronic Nicotine Delivery System Brands and Flavors on Instagram: Natural Language Processing Analysis.
Rob Chew ... Jamie Guillory
Journal of medical Internet research | VOL. 24
Rob Chew, et. al.Rob Chew ... Jamie Guillory
18 Jan 2022
Journal of medical Internet research | VOL. 24

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

ParsNER-Social: A Corpus for Named Entity Recognition in Persian Social Media Texts

Abstract

Talk to us

Similar Papers

More From: Journal of AI and Data Mining