Abstract

Entity matching is an issue of interest in information integration and data cleaning. Since the representations of the same entity vary, it is often impossible to fully automate the entity matching and require human inputs. However, to guarantee high-quality entity matching, how to integrate human resources into the entity matching while minimizing the cost of human resources? In this paper, we propose BUBBLE, a novel human-in-the-loop entity matching framework hybridizing Bayesian inference and crowdsourcing. To guarantee entity matching quality, Bayesian inference is conducted to determine whether the matching requires crowdsourcing. We show that we can define Bayesian error rate for this problem. For optimization, we use metric learning to select the candidate matching pairs by nearest-neighbor search in the learned embedding space, and we construct a k-nearest neighbor graph to avoid the redundant matching. We applied BUBBLE to a bibliographic data matching problem on the National Diet Library. The experimental results show that BUBBLE can assign tasks to humans with higher quality results compared to those of the same number of task assignments to humans. The result also shows that our optimization scheme is effective without sacrificing the quality.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.