Automatic Arabic Named Entity Extraction and Classification for Information Retrieval

Omar Asbayou

doi:10.5121/ijnlc.2020.9601

Abstract

This article tries to explain our rule-based Arabic Named Entity recognition (NER) and classification system. It is based on lists of classified proper names (PN) and particularly on syntactico-semantic patterns resulting in fine classification of Arabic NE. These patterns use syntactico-semantic combination of morpho-syntactic and syntactic entities. It also uses lexical classification of trigger words and NE extensions. These linguistic data are essential not only to name entity extraction but also to the taxonomic classification and to determining the NE frontiers. Our method is also based on the contextualisation and on the notion of NE class attributes and values. Inspired from X-bar theory and immediate constituents, we built a rule-based NER system composed of five levels of syntactico-semantic combination. We also show how the fine NE annotations in our system output (XML database) is exploited in information retrieval and information extraction.

Highlights

NE extraction is a widely studied subtask of information extraction
The results obtained by the treatment of the corpus are as follows : International Journal on Natural Language Computing (IJNLC) Vol.9, No.6, December 2020 Table 2
The morpho-syntactic analysis is based on DIINAR with is a rich Arabic lexical database

Summary

INTRODUCTION

NE extraction is a widely studied subtask of information extraction. They are essential for many natural language processing applications such as information retrieval, information extraction, machine translation, strategic foresight, question-answering systems etc. Little work has been done on Arabic NE extraction and classification (especially rule-based systems). Three main approches are used in NE extraction process: statistical, rule-based and hybrid approaches. We describe the linguistic basis of Arabic NE analysis and the Arabic NE structures and constituents; we show how these latter are combined into single and complex NE. We expose the evaluation of our NE extraction and classification system based on the treatment of two journalistic corpora, and show examples of the contribution of our system in the field of information retrieval and extraction

GENERAL BACKGROUND

LINGUISTIC INFORMATION

Syntactico-semantic information

Trigger word

Syntactic approach

Semantic approach

NE extensions

LEVELS OF SYNTACTICO-SEMANTIC RULE CONSTRUCTION

Level 1

NP with annexion A)

Level 4

Level 5

EVALUATION AND APPLICATION

CONCLUSION AND PERSPECTIVES

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Automatic Arabic Named Entity Extraction and Classification for Information Retrieval

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: International Journal on Natural Language Computing

Lead the way for us

Journal: International Journal on Natural Language Computing	Publication Date: Dec 30, 2020
License type: cc-by

Similar Papers

Arabic Location Name Annotations and Applications
Omar Asbayou
-
Omar AsbayouOmar Asbayou
21 Nov 2020
21 Nov 2020

Modelings and techniques in named entity recognition: an information extraction task
N Kanya ... T Ravi
-
N Kanya, et. al.N Kanya ... T Ravi
01 Jan 2012
01 Jan 2012

Improving Named Entity Recognition using Bilingual Constraints and Word Alignment
An T Dao ... Long Nguyen
IOP Conf. Series: Materials Science and Engineering | VOL. 435
An T Dao, et. al.An T Dao ... Long Nguyen
01 Oct 2018
IOP Conf. Series: Materials Science and Engineering | VOL. 435

Recurrent Neural Network-Based Model for Named Entity Recognition with Improved Word Embeddings
Archana Goyal ... Manish Kumar
IETE Journal of Research | VOL. 69
Archana Goyal, et. al.Archana Goyal ... Manish Kumar
08 Dec 2021
IETE Journal of Research | VOL. 69

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Automatic Arabic Named Entity Extraction and Classification for Information Retrieval

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: International Journal on Natural Language Computing