Many patterns have been uncovered in complex systems through the application of concepts and methodologies of complex networks. Unfortunately, the validity and accuracy of the unveiled patterns are strongly dependent on the amount of unavoidable noise pervading the data, such as the presence of homonymous individuals in social networks. In the current paper, we investigate the problem of name disambiguation in collaborative networks, a task that plays a fundamental role on a myriad of scientific contexts. In special, we use an unsupervised technique which relies on a particle competition mechanism in a networked environment to detect the clusters. It has been shown that, in this kind of environment, the learning process can be improved because the network representation of data can capture topological features of the input data set. Specifically, in the proposed disambiguating model, a set of particles is randomly spawned into the nodes constituting the network. As time progresses, the particles employ a movement strategy composed of a probabilistic convex mixture of random and preferential walking policies. In the former, the walking rule exclusively depends on the topology of the network and is responsible for the exploratory behavior of the particles. In the latter, the walking rule depends both on the topology and the domination levels that the particles impose on the neighboring nodes. This type of behavior compels the particles to perform a defensive strategy, because it will force them to revisit nodes that are already dominated by them, rather than exploring rival territories. Computer simulations conducted on the networks extracted from the arXiv repository of preprint papers and also from other databases reveal the effectiveness of the model, which turned out to be more accurate than traditional clustering methods.
Read full abstract