Splitting Tuples of Mismatched Entities

Wenfei Fan,Min Xie,Weilong Ren,Mengyi Yan,Ziyan Han,Ding Wang,Yaoshu Wang

doi:10.1145/3626763

Abstract

There has been a host of work on entity resolution (ER), to identify tuples that refer to the same entity. This paper studies the inverse of ER, to identify tuples to which distinct real-world entities are matched by mistake, and split such tuples into a set of tuples, one for each entity. We formulate the tuple splitting problem. We propose a scheme to decide what tuples to split and what tuples to correct without splitting, fix errors/assign attribute values to the split tuples, and impute missing values. The scheme introduces a class of rules, which embed predicates for aligning entities across relations and knowledge graphs G, assessing correlation between attributes, and extracting data from G. It unifies logic deduction, correlation models, and data extraction by chasing the data with the rules. We train machine learning models to assess attribute correlation and predict missing values. We develop algorithms for the tuple splitting scheme. Using real-life data, we empirically verify that the scheme is efficient and accurate, with F-measure 0.92 on average.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Splitting Tuples of Mismatched Entities

Abstract

Talk to us

Similar Papers

More From: Proceedings of the ACM on Management of Data

Lead the way for us

Journal: Proceedings of the ACM on Management of Data	Publication Date: Dec 8, 2023
Citations: 1

Similar Papers

A novel and efficient risk minimisation-based missing value imputation algorithm
Yu-Lin He ... Joshua Zhexue Huang
Knowledge-Based Systems | VOL. 304
Yu-Lin He, et. al.Yu-Lin He ... Joshua Zhexue Huang
28 Aug 2024
Knowledge-Based Systems | VOL. 304

Adversarial Joint-Learning Recurrent Neural Network for Incomplete Time Series Classification.
Qianli Ma ... Garrison W Cottrell
IEEE Transactions on Pattern Analysis and Machine Intelligence | VOL. 44
Qianli Ma, et. al.Qianli Ma ... Garrison W Cottrell
30 Sep 2020
IEEE Transactions on Pattern Analysis and Machine Intelligence | VOL. 44

Incremental entity resolution process over query results for data integration systems
Priscilla Kelly Machado Vieira ... Bernadette Farias Lóscio
Journal of Intelligent Information Systems | VOL. 52
Priscilla Kelly Machado Vieira, et. al.Priscilla Kelly Machado Vieira ... Bernadette Farias Lóscio
29 Jan 2019
Journal of Intelligent Information Systems | VOL. 52

Handling missing values: A study of popular imputation packages in R
Madan Lal Yadav ... Basav Roychoudhury
Knowledge-Based Systems | VOL. 160
Madan Lal Yadav, et. al.Madan Lal Yadav ... Basav Roychoudhury
06 Jul 2018
Knowledge-Based Systems | VOL. 160

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Splitting Tuples of Mismatched Entities

Abstract

Talk to us

Similar Papers

More From: Proceedings of the ACM on Management of Data