Abstract

Abstract Semantic annotation of named entities for enriching unstructured content is a critical step in development of Semantic Web and many Natural Language Processing applications. To this end, this paper addresses the named entity disambiguation problem that aims at detecting entity mentions in a text and then linking them to entries in a knowledge base. In this paper, we propose a hybrid method, combining heuristics and statistics, for named entity disambiguation. The novelty is that the disambiguation process is incremental and includes several rounds that filter the candidate referents, by exploiting previously identified entities and extending the text by those entity attributes every time they are successfully resolved in a round. Experiments are conducted to evaluate and show the advantages of the proposed method. The experiment results show that our approach achieves high accuracy and can be used to construct a robust entity disambiguation system.

Highlights

  • In Information Extraction (IE) and Natural Language Processing (NLP) areas, named entities (NE) are people, organizations, locations, and others that are referred to by proper names

  • For the text “About three-quarters of white, college-educated men age over 65 use the Internet, says Susannah Fox, [...] John McCain is an outlier when you compare him to his peers, Fox says.”, there are 164 entities in the Wikipedia version used with the same name “Fox”

  • Due to the aforementioned possible error of a named entity recognition module splitting a name into two separate ones, we introduce the notion of partially correct mappings

Read more

Summary

Introduction

In Information Extraction (IE) and Natural Language Processing (NLP) areas, named entities (NE) are people, organizations, locations, and others that are referred to by proper names. The name “John McCarthy” in different occurrences may refer to different NEs such as a computer scientist from Stanford University, a linguist from University of Massachusetts Amherst, an Australian ambassador, a British journalist who was kidnapped by Iranian terrorists in Lebanon in April 1986, etc Such ambiguity makes identification of NEs more difficult and raises NE disambiguation problem (NED) as one of the main challenges to research in the Semantic Web and in areas of natural language processing in general. The proposed method is rule-based and statistical-based It utilizes NEs and related terms co-occurring with the target entity in a text and Wikipedia for disambiguation because the intuition is that these respectively convey its relationship and attributes. We use the terms name and mention interchangeably, as well as for the terms entity and referent

Background
Wikipedia
Related Problems
Related Work
Proposed method
Heuristic
Disambiguation text following
Next to disambiguation text
Disambiguation text in the same window
Coreference relation
Default referents
Statistical Ranking Model
Disambiguating process
Experiments and evaluation
Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.