Abstract
Since its post-World War II inception, the science of record linkage has grown exponentially and is used across industrial, governmental, and academic agencies. The academic fields that rely on record linkage are diverse, ranging from history to public health to demography. In this paper, we introduce the different types of data linkage and give a historical context to their development. We then introduce the three types of underlying models for probabilistic record linkage: Fellegi-Sunter-based methods, machine learning methods, and Bayesian methods. Practical considerations, such as data standardization and privacy concerns, are then discussed. Finally, recommendations are given for organizations developing or maintaining record linkage programs, with an emphasis on organizations measuring long-term complications of disasters, such as 9/11.
Highlights
From its humble beginnings in post-World War II public health research, the field of “record linkage”—that is, the matching of records for unique entitiesacross one or more lists—has exploded into a multi-field research focus
The origins of record linkage as a field begin at the end of World War II; the original papers on record linkage related to family structure in the United States and elsewhere [1,2,3] and a population registry in Canada [4]
Current research topics related to these concerns revolve around privacy-preserving record linkage and understanding the bias introduced by the requirement for informed consent [26,27]
Summary
From its humble beginnings in post-World War II public health research, the field of “record linkage”—that is, the matching of records for unique entities (typically people, but sometimes organizations, addresses, or something else)across one or more lists—has exploded into a multi-field research focus (see Figure 1). Several joint studies are being formulated to study pooled patient populations across cohorts. The reasons for this are both scientific and practical. As more data become available electronically and computational power improves, access to health data, at least from a technical point of view, has become easier. This is fortuitous as maintaining a large-scale research project over multiple decades among a trauma-exposed and aging population presents several challenges, chief among them attrition and reporting bias due to failing memories among respondents.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have