Abstract

Record linkage deals with finding records that identify the same real world entity, such as an individual or a business, from a given file or set of files and has many applications. This problem is also referred to as the entity resolution or record recognition problem. To locate those records identifying the same real world entity, in principle, pairwise record analyses have to be performed among all records. Analytical operations are complex and take a lot of time. The number of such analyses is quadratic in terms of the number of records given and therefore is very time consuming. To reduce the number of pairwise record comparisons, blocking techniques are introduced to partition the records into blocks and records in each block are analyzed against one and another. One of the effective blocking methods is the closure approach. In this paper, we describe the design and implementation of a parallel and distributed closure prototype system running in an enterprise grid. The system can either produce all closures to a file in a batch fashion or run as a service where upon receiving a record it returns the closure of that record. Preliminary experiment indicates the approach is efficient and scalable.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.