Abstract

This article proposes a notion of parametric simulation to link entities across a relational database 𝒟 and a graph G . Taking functions and thresholds for measuring vertex closeness, path associations, and important properties as parameters, parametric simulation identifies tuples t in 𝒟 and vertices v in G that refer to the same real-world entity, based on both topological and semantic matching. We develop machine learning methods to learn the parameter functions and thresholds. We show that parametric simulation is in quadratic-time by providing such an algorithm. Moreover, we develop an incremental algorithm for parametric simulation; we show that the incremental algorithm is bounded relative to its batch counterpart, i.e., it incurs the minimum cost for incrementalizing the batch algorithm. Putting these together, we develop HER , a parallel system to check whether ( t, v ) makes a match, find all vertex matches of t in G , and compute all matches across 𝒟 and G , all in quadratic-time; moreover, HER supports incremental computation of these in response to updates to 𝒟 and G . Using real-life and synthetic data, we empirically verify that HER is accurate with F-measure of 0.94 on average, and is able to scale with database 𝒟 and graph G for both batch and incremental computations.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call