Abstract

This article tries to explain our rule-based Arabic Named Entity recognition (NER) and classification system. It is based on lists of classified proper names (PN) and particularly on syntactico-semantic patterns resulting in fine classification of Arabic NE. These patterns use syntactico-semantic combination of morpho-syntactic and syntactic entities. It also uses lexical classification of trigger words and NE extensions. These linguistic data are essential not only to name entity extraction but also to the taxonomic classification and to determining the NE frontiers. Our method is also based on the contextualisation and on the notion of NE class attributes and values. Inspired from X-bar theory and immediate constituents, we built a rule-based NER system composed of five levels of syntactico-semantic combination. We also show how the fine NE annotations in our system output (XML database) is exploited in information retrieval and information extraction.

Highlights

  • NE extraction is a widely studied subtask of information extraction

  • The results obtained by the treatment of the corpus are as follows : International Journal on Natural Language Computing (IJNLC) Vol.9, No.6, December 2020 Table 2

  • The morpho-syntactic analysis is based on DIINAR with is a rich Arabic lexical database

Read more

Summary

INTRODUCTION

NE extraction is a widely studied subtask of information extraction. They are essential for many natural language processing applications such as information retrieval, information extraction, machine translation, strategic foresight, question-answering systems etc. Little work has been done on Arabic NE extraction and classification (especially rule-based systems). Three main approches are used in NE extraction process: statistical, rule-based and hybrid approaches. We describe the linguistic basis of Arabic NE analysis and the Arabic NE structures and constituents; we show how these latter are combined into single and complex NE. We expose the evaluation of our NE extraction and classification system based on the treatment of two journalistic corpora, and show examples of the contribution of our system in the field of information retrieval and extraction

GENERAL BACKGROUND
LINGUISTIC INFORMATION
Syntactico-semantic information
Trigger word
Syntactic approach
Semantic approach
NE extensions
LEVELS OF SYNTACTICO-SEMANTIC RULE CONSTRUCTION
Level 1
NP with annexion A)
Level 4
Level 5
EVALUATION AND APPLICATION
CONCLUSION AND PERSPECTIVES
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call