Abstract
Entity disambiguation is the task to resolve the underlying entity with the same surface form in the data. It arises from information integration, document retrieval, web search and many other applications. Based on the fact that entity occurring in most of the real world data possess both the textual information and the interobject relationship, we propose an unsupervised iterative similarity propagation algorithm to disambiguate entities. We first choose the entity pairs with the same surface form as the probable matching candidates, and construct a connection graph which take these probable matching pairs as nodes and built edges with the interobject relationship. Because the more similar textual information the two records in one probable pair possess, the greater possibility the two records correspond to the same real world entity. We use the textual similarity score as the initial value for our iterative method. Then the similarity of each entity pair is propagated based on the connection graph constructed. When the iteration is terminated, we identify the pairs whose final similarity scores are larger than a given threshold as the real match. The new method is applied to disambiguate authors in publication records. Experimental results on the real DBLP digital library data set demonstrate the effectiveness.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.