Abstract

Named entity recognition (NER) is fundamental in several natural language processing applications. It involves finding and categorizing text into predefined categories such as a person's name, location, and so on. One of the most famous approaches to identify named entity is the rule‐based approach. This paper introduces a rule‐based NER method that can be used to examine Classical Arabic documents. The proposed method relied on triggers words, patterns, gazetteers, rules, and blacklists generated by the linguistic information about entities named in Arabic. The method operates in three stages, operational stage, preprocessing stage, and processing the rule application stage. The proposed approach was evaluated, and the results indicate that this approach achieved a 90.2% rate of precision, an 89.3% level of recall, and an F‐measure of 89.5%. This new approach was introduced to overcome the challenges related to coverage in rule‐based NER systems, especially when dealing with Classical Arabic texts. It improved their performance and allowed for automated rule updates. The grammar rules, gazetteers, blacklist, patterns, and trigger words were all integrated into the rule‐based system in this way.

Highlights

  • Named entity recognition is a crucial step in numerous natural language processing (NLP) applications such as machine translation, question answering, and information retrieval, to name a few [1, 2]

  • We introduce a rule-based Named entity recognition (NER) method that can be used to examine Classical Arabic documents. e proposed method relied on triggers words, patterns, gazetteers, rules, Journal of Mathematics and blacklists generated by the linguistic information pertaining to entities named in Arabic

  • En, the operational contents were discussed with the preprocessing and processing stages. e new approach proposed by this study used trigger words, gazetteers, regular expressions, grammatical rules, and blacklists, and the methodology was explained

Read more

Summary

Introduction

Named entity recognition is a crucial step in numerous natural language processing (NLP) applications such as machine translation, question answering, and information retrieval, to name a few [1, 2]. Arabic is a morphologically complex language due to its inflectional nature; it has a general form of a word: prefix(es) + stem + suffix(es), with the number of prefixes and suffixes ranging from 0 to many. Another issue is that, depending on its position in the world, an Arabic letter can take up to three different forms [9, 10]. E proposed method relied on triggers words, patterns, gazetteers, rules, Journal of Mathematics and blacklists generated by the linguistic information pertaining to entities named in Arabic.

Related Work
Linguistic Resources
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.