A node resistance-based probability model for resolving duplicate named entities

Namyong Kang,Byung-Won On,Ingyu Lee,Jeong-Jae Kim

doi:10.1007/s11192-020-03585-4

Abstract

Duplicate entities tend to degrade the quality of data seriously. Despite recent remarkable achievement, existing methods still produce a large number of false positives (i.e., an entity determined to be a duplicate one when it is not) that are likely to impair the accuracy. Toward this challenge, we propose a novel node resistance-based probability model in which we view a given data set as a graph of entities that are linked each other via relationships, and then compute the probability value between two entities to see how similar the two entities are. Especially, in the graph, each node has its own resistance value equivalent to 1-confidence (normalized in 0–1) and resistance $$\cdot $$ probability value is filtered out per node during computing the probability value. To evaluate the proposed model, we performed intensive experiments with different data sets including ACM ( https://dl.acm.org ), DBLP ( https://dblp.uni-trier.de ), and IMDB ( https://imdb.com ). Our experimental results show that the proposed probability model outperforms the existing probability model, improving average F1 scores up to 14%, but never worsens them.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

A node resistance-based probability model for resolving duplicate named entities

Abstract

Talk to us

Similar Papers

More From: Scientometrics

Lead the way for us

Journal: Scientometrics	Publication Date: Jul 13, 2020
Citations: 2

Similar Papers

Natural Language Processing Risk Assessment Application Developed for Marble Quarries
Hasan Eker
Applied Sciences | VOL. 14
Hasan EkerHasan Eker
07 Oct 2024
Applied Sciences | VOL. 14

Semantic Fusion of Image Annotation
Xiao Ying Wu ... Li Juan Ma
Advanced Materials Research | VOL. 268-270
Xiao Ying Wu, et. al.Xiao Ying Wu ... Li Juan Ma
01 Jul 2011
Advanced Materials Research | VOL. 268-270

Fitting the distribution of dry and wet spells with alternative probability models
Sayang Mohd Deni ... Abdul Aziz Jemain
Meteorology and Atmospheric Physics | VOL. 104
Sayang Mohd Deni, et. al.Sayang Mohd Deni ... Abdul Aziz Jemain
14 Nov 2008
Meteorology and Atmospheric Physics | VOL. 104

Mixed log series geometric distribution for sequences of dry days
Sayang Mohd Deni ... Abdul Aziz Jemain
Atmospheric Research | VOL. 92
Sayang Mohd Deni, et. al.Sayang Mohd Deni ... Abdul Aziz Jemain
06 Nov 2008
Atmospheric Research | VOL. 92

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

A node resistance-based probability model for resolving duplicate named entities

Abstract

Talk to us

Similar Papers

More From: Scientometrics