Abstract

IntroductionProbabilistic Record Linkage of large databases requires a substantial amount of time and resources, resulting in significant costs. In addition, the process is subject to error, particularly during manual grey area resolution of uncertain matched pairs. Objectives and ApproachThe objective of this semi-experimental desinged study was to compare the accuracy and efficiency of different record linkage approaches. Four different record linkage software packages were selected: AutoMatch, G-Link, SAS Data Quality (DataFlux) and LinxMart. A large data set with all required linkage variables (e.g., first and last name, date of birth and gender) and a common unique identifier with the ICES linkage spine (registry) was chosen to represent our ground truth. Four non-overlapping cohorts were randomly selected from this data source, representing small (n=10,000), medium (n=250,000) and large (n=5,000,000) data sets. Simulated errors were inserted into each cohort to represent a real linkage scenario. The smallest cohort was used to run a complete record linkage for each software package. Where the software allowed for manual grey area resolution, linkage was replicated by two different linkage analysts who were blinded to the simulated errors included in the data set. The time spent by each analyst on processing, programming and manual grey area resolution was recorded. The larger cohorts were used to measure accuracy and processing time taken by each of the software packages. In order to analyse possible errors, detailed output from each software package was generated to compare accepted and rejected pairs with our ground truth. ResultsThis project is still ongoing. Evaluation of AutoMatch, G-Link and SAS Data Quality has largely been completed. The remaining analyses will be completed by August 2020. Conclusion / ImplicationsThe outcome of this project can inform the record linkage strategy at organizations and data centres such as ICES and help identify more efficient methods that preserve an acceptable level of accuracy for their needs.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.