Matching and record linkage

William E Winkler

doi:10.1002/wics.1317

Abstract

This overview gives background on a number of statistical methods that have been proven effective for record linkage. To prepare data for the main computational algorithms, we need parsing/standardization that allows us to structure the free‐form names, addresses, and other fields into corresponding components. The main parameter‐estimation methods are unsupervised methods that yield ‘optimal’ record linkage parameters. Extended methods provide estimates of false match rates in both unsupervised and, with greater accuracy, in semi‐supervised situations. Finally, the paper describes ongoing research for adjusting standard statistical analyses for linkage error. WIREs Comput Stat 2014, 6:313–325. doi: 10.1002/wics.1317This article is categorized under: Statistical and Graphical Methods of Data Analysis > EM Algorithm Algorithms and Computational Methods > Seminumerical and Nonnumerical Methods Data: Types and Structure > Data Preparation and Processing Statistical and Graphical Methods of Data Analysis > Markov Chain Monte Carlo (MCMC)

Full Text