Name Disambiguation Problem Research Articles

Many patterns have been uncovered in complex systems through the application of concepts and methodologies of complex networks. Unfortunately, the validity and accuracy of the unveiled patterns are strongly dependent on the amount of unavoidable noise pervading the data, such as the presence of homonymous individuals in social networks. In the current paper, we investigate the problem of name disambiguation in collaborative networks, a task that plays a fundamental role on a myriad of scientific contexts. In special, we use an unsupervised technique which relies on a particle competition mechanism in a networked environment to detect the clusters. It has been shown that, in this kind of environment, the learning process can be improved because the network representation of data can capture topological features of the input data set. Specifically, in the proposed disambiguating model, a set of particles is randomly spawned into the nodes constituting the network. As time progresses, the particles employ a movement strategy composed of a probabilistic convex mixture of random and preferential walking policies. In the former, the walking rule exclusively depends on the topology of the network and is responsible for the exploratory behavior of the particles. In the latter, the walking rule depends both on the topology and the domination levels that the particles impose on the neighboring nodes. This type of behavior compels the particles to perform a defensive strategy, because it will force them to revisit nodes that are already dominated by them, rather than exploring rival territories. Computer simulations conducted on the networks extracted from the arXiv repository of preprint papers and also from other databases reveal the effectiveness of the model, which turned out to be more accurate than traditional clustering methods.

Read full abstract

When non-unique values are used as the identifier of entities, due to their homonym, confusion can occur. In particular, when (part of) “names” of entities are used as their identifier, the problem is often referred to as a name disambiguation problem, where goal is to sort out the erroneous entities due to name homonyms (e.g., If only last name is used as the identifier, one cannot distinguish “Masao Obama” from “Norio Obama”). In this paper, in particular, we study the scalability issue of the name disambiguation problem—when (1) a small number of entities with large contents or (2) a large number of entities get un-distinguishable due to homonyms. First, we carefully examine two of the state-of-the-art solutions to the name disambiguation problem and point out their limitations with respect to scalability. Then, we propose two scalable graph partitioning algorithms known as multi-level graph partitioning and multi-level graph partitioning and merging to solve the large-scale name disambiguation problem. Our claim is empirically validated via experimentation—our proposal shows orders of magnitude improvement in terms of performance while maintaining equivalent or reasonable accuracy compared to competing solutions.

Read full abstract

Name Disambiguation Problem Research Articles

Related Topics

Articles published on Name Disambiguation Problem

Network-based stochastic competitive learning approach to disambiguation in collaborative networks

An ontological gazetteer and its application for place name disambiguation in text

Scalable clustering methods for the name disambiguation problem

Clustering web people search results using fuzzy ants

Utilization of external knowledge for personal name disambiguation

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Name Disambiguation Problem Research Articles

Related Topics

Articles published on Name Disambiguation Problem

Network-based stochastic competitive learning approach to disambiguation in collaborative networks

An ontological gazetteer and its application for place name disambiguation in text

Scalable clustering methods for the name disambiguation problem

Clustering web people search results using fuzzy ants

Utilization of external knowledge for personal name disambiguation