Abstract
Name Entity Recognition is very important task in many natural language processing applications such as; Machine Translation, Question Answering, Information Extraction, Text Summarization, Semantic Applications and Word Sense Disambiguation. Rule-based approach is one of the techniques that are used for named entity recognition to identify the named entities such as a person names, location names and organization names. The recent rule-based methods have been applied to recognize the person names in political domain. They ignored the recognition of other named entity types such as locations and organizations. We have used the rule based approach for recognizing the named entity type (person names) for Arabic. We have developed four rules for identifying the person names depending on the position of name. We have used an in-house Arabic corpus collected from newspaper achieves. The evaluation method that compares the results of the system with the manually annotated text has been applied in order to compute precision, recall and f-measure. In the experiment of this study, the average f-measure for recognizing person names are (92.66, 92.04 and 90.43%) in sport, economic and politic domain respectively. The experimental results showed that our rule-based method achieved the highest f-measure values in sport domain comparing with political and economic domains.
Highlights
The Named Entity Recognition (NER) is very important task in many natural language processing applications such as, machine translation, question answering, information extraction, text summarization, semantic applications and word sense disambiguation
Rule-based approach is one of the techniques that are used for named entity recognition to identify the named entities such as a person names, location names and organization names
The current corpus is an in-house corpus that has been collected from online Arabic newspaper archives including koora.net, aleqt.net and Alquds.net. This corpus includes three classifications: Sport, economic and politic. It is an electronic corpus of modern standard Arabic that is used for named entity recognition
Summary
The Named Entity Recognition (NER) is very important task in many natural language processing applications such as, machine translation, question answering, information extraction, text summarization, semantic applications and word sense disambiguation. There are a recent works (Elsebai et al, 2009; Shaalan, 2010) that focus on the named entity recognition in Arabic by using the rule based approach, but they developed the rule in political domain and ignored other domains such as economic, sport and health. This means their approach is limited for recognition the political text only and it cannot be used for recognition other domains. We introduce four rules for identifying the person
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.