Abstract

BackgroundLinkage of electronic healthcare records is becoming increasingly important for research purposes. However, linkage error due to mis-recorded or missing identifiers can lead to biased results. We evaluated the impact of linkage error on estimated infection rates using two different methods for classifying links: highest-weight (HW) classification using probabilistic match weights and prior-informed imputation (PII) using match probabilities.MethodsA gold-standard dataset was created through deterministic linkage of unique identifiers in admission data from two hospitals and infection data recorded at the hospital laboratories (original data). Unique identifiers were then removed and data were re-linked by date of birth, sex and Soundex using two classification methods: i) HW classification - accepting the candidate record with the highest weight exceeding a threshold and ii) PII–imputing values from a match probability distribution. To evaluate methods for linking data with different error rates, non-random error and different match rates, we generated simulation data. Each set of simulated files was linked using both classification methods. Infection rates in the linked data were compared with those in the gold-standard data.ResultsIn the original gold-standard data, 1496/20924 admissions linked to an infection. In the linked original data, PII provided least biased results: 1481 and 1457 infections (upper/lower thresholds) compared with 1316 and 1287 (HW upper/lower thresholds). In the simulated data, substantial bias (up to 112%) was introduced when linkage error varied by hospital. Bias was also greater when the match rate was low or the identifier error rate was high and in these cases, PII performed better than HW classification at reducing bias due to false-matches.ConclusionsThis study highlights the importance of evaluating the potential impact of linkage error on results. PII can help incorporate linkage uncertainty into analysis and reduce bias due to linkage error, without requiring identifiers.

Highlights

  • Linkage of electronic healthcare records is becoming increasingly important for research purposes

  • We evaluate the impact of linkage error on analysis of infection rates in paediatric intensive care, based on a national audit dataset (PICANet, the Paediatric Intensive Care Audit Network) and infection surveillance data linked using highest-weight (HW) classification and prior-informed imputation (PII) [18]

  • The crude rate of PICUacquired blood-stream infection (BSI) was identified as 11.33, 11.08 (10.48-11.69), 12.75 (11.61-13.89) and 12.55 (11.4213.68) for HW threshold 1, HW threshold 2, PII 0.1 and PII 0.9 respectively

Read more

Summary

Introduction

Linkage of electronic healthcare records is becoming increasingly important for research purposes. Linkage error due to mis-recorded or missing identifiers can lead to biased results. Linkage of records between electronic health databases is becoming increasingly important for research purposes as individual-level electronic information can be combined relatively quickly and inexpensively [1,2]. The success of such data linkage depends on data quality, linkage methods, and the ultimate purpose of the linked data [3]. Errors that occur during the linkage process (false-matches and missed-matches) can lead to biased results, the extent of this bias in research based on linked data is difficult to measure, as reported. The choice of thresholds directly affects the number of false-matches and missed-matches in linked data

Methods
Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call