Abstract
With the rapid development of information technology, the name ambiguity problem has become one of the primary issues in the fields of information retrieval, data mining, and scientific measurement. Name disambiguation is used to promote computer technology and big data information, which maps virtual relational networks to real social networks to solve the problem that the same name points to multiple entities. At present many literature search platforms launched their respective scholar system, name ambiguity problem will inevitably affect the precision of other information calculations, reduce the credibility of the system, and affect the information quality and content quality. Most work deals with this issue by using graph theory and clustering. However, the name disambiguation problem is still not well resolved. In this paper, we propose a multi-level name disambiguation algorithm. This algorithm is mainly based on the unsupervised algorithm, which combines hierarchical agglomerative clustering (HAC) and graph theory for disambiguating. The experimental results show that the proposed solution achieves clearly better performance (+17 ~ 25% in terms of F1-Measure) than several methods, including HAC and Graph.
Highlights
According to the National Science Foundation’s ‘‘Science & Engineering Indicators’’ report in 2018 [1], China published 426.165 million academic papers, which is the largest amount in the world, surpassing the United States (408,985) for the first time
Based on the previous research results, this paper proposes the method of author name disambiguation
We considered several baseline methods based on Hierarchical Agglomerative Clustering (HAC) [2] and Graph [3]
Summary
According to the National Science Foundation’s ‘‘Science & Engineering Indicators’’ report in 2018 [1], China published 426.165 million academic papers, which is the largest amount in the world, surpassing the United States (408,985) for the first time. The current problem is that when the scholar library needs to compute the influence of scholars, the number of author’s papers or other information, it is difficult to distinguish the same name scholar accurately. Based on the previous research results, this paper proposes the method of author name disambiguation. This algorithm is mainly based on unsupervised algorithm, which combines hierarchical clustering and graph theory for disambiguating. The structure of this paper is as follows: In Sect., we introduce the related research work of name disambiguation This part mainly summarizes related work in the past, and the background of the author name disambiguation method proposed in this paper. This part propose work to be done in the future
Published Version (
Free)
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have