Abstract

Record linkage deals with finding records that identify the same real world entity, such as an individual or a business, from a given file or set of files. Record linkage problem is also referred to as the entity resolution or record recognition problem. To locate those records identifying the same real world entity, in principle, pairwise record analyses have to be performed among all records. Analytical operations between two records vary from comparing corresponding fields to enhancing records through large knowledge bases and querying large databases. Hence, these operations are complex and take time. To reduce the number of pairwise record comparisons, blocking techniques are introduced to partition the records into blocks. After that records in each block are analyzed against one and another. One of the effective blocking methods is the closure approach, where a “related” equivalence relation is used to partition the records into equivalence classes. This paper introduces the closure problem and describes the design and implementation of a parallel and distributed closure prototype system running in an enterprise grid.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call