Iterative Entity Resolution

Vassilis Christophides,Vasilis Efthymiou,Kostas Stefanidis

doi:10.1007/978-3-031-79468-1_4

Abstract

As we have seen in Chapter 2.1, to minimize the number of missed matches, an iterative entity resolution (ER) process can progressively exploit any intermediate results of blocking and matching, discovering new candidate description pairs for resolution, even if this process entails additional processing cost. The main objective of the algorithms for iterative entity resolution, which is abstractly described in Section 4.1, is to identify matches based on knowledge gained from previously identified matches. We distinguish between merging-based (Section 4.2) and relationship-based (Section 4.3) iterative ER approaches. In the former, new matches can be identified by exploiting the merging of the previously located matches, while in the latter, iterations rely on the similarity evidence provided by descriptions being structurally related in the original entity graph. As we will see in Section 4.4, iterative ER can be also interleaved with the process of blocking, where matches are sought only within a block and if identified, they are propagated to other blocks. Finally, in Section 4.5, we overview works on incremental ER, in which the obtained ER results at each phase are enriched when new descriptions are made available (eventually in streams), and in Section 4.6, we present recent works on progressive ER, which attempt to discover as many matches as possible given limited computing budget, by estimating the matching likelihood of yet unresolved descriptions, based on the matches found so far.

Full Text