Abstract

BackgroundStudies based on high-quality linked data in developed countries show that even minor linkage errors, which occur when records of two different individuals are erroneously linked or when records belonging to the same individual are not linked, can impact bias and precision of subsequent analyses. We evaluated the impact of linkage quality on inferences drawn from analyses using data with substantial linkage errors in rural Tanzania.MethodsSemi-automatic point-of-contact interactive record linkage was used to establish gold standard links between community-based HIV surveillance data and medical records at clinics serving the surveillance population. Automated probabilistic record linkage was used to create analytic datasets at minimum, low, medium, and high match score thresholds. Cox proportional hazards regression models were used to compare HIV care registration rates by testing modality (sero-survey vs. clinic) in each analytic dataset. We assessed linkage quality using three approaches: quantifying linkage errors, comparing characteristics between linked and unlinked data, and evaluating bias and precision of regression estimates.ResultsBetween 2014 and 2017, 405 individuals with gold standard links were newly diagnosed with HIV in sero-surveys (n = 263) and clinics (n = 142). Automated probabilistic linkage correctly identified 233 individuals (positive predictive value [PPV] = 65%) at the low threshold and 95 individuals (PPV = 90%) at the high threshold. Significant differences were found between linked and unlinked records in primary exposure and outcome variables and for adjusting covariates at every threshold. As expected, differences attenuated with increasing threshold. Testing modality was significantly associated with time to registration in the gold standard data (adjusted hazard ratio [HR] 4.98 for clinic-based testing, 95% confidence interval [CI] 3.34, 7.42). Increasing false matches weakened the association (HR 2.76 at minimum match score threshold, 95% CI 1.73, 4.41). Increasing missed matches (i.e., increasing match score threshold and positive predictive value of the linkage algorithm) was strongly correlated with a reduction in the precision of coefficient estimate (R2 = 0.97; p = 0.03).ConclusionsSimilar to studies with more negligible levels of linkage errors, false matches in this setting reduced the magnitude of the association; missed matches reduced precision. Adjusting for these biases could provide more robust results using data with considerable linkage errors.

Highlights

  • Studies based on high-quality linked data in developed countries show that even minor linkage errors, which occur when records of two different individuals are erroneously linked or when records belonging to the same individual are not linked, can impact bias and precision of subsequent analyses

  • We evaluated for bias in precision by comparing regression coefficients and standard errors of the primary exposure variable in the gold standard data with those obtained at each selected match score threshold

  • 126 (89%) Human immunodeficiency virus (HIV) diagnoses occurred in the HIV testing and counselling clinic (HTC) and the remaining 16 (11%) diagnoses were made in the antenatal clinic (ANC)

Read more

Summary

Introduction

Studies based on high-quality linked data in developed countries show that even minor linkage errors, which occur when records of two different individuals are erroneously linked or when records belonging to the same individual are not linked, can impact bias and precision of subsequent analyses. Errors arising during the linkage process because of imperfect identifiers can result in two types of linkage errors: false matches (records of two different individuals are erroneously linked) and missed matches (records belonging to the same individual are not linked). These linkage errors have been shown to impact the bias and precision of subsequent analyses [4, 5]. Whether and how analyses are affected by linkage errors in settings with lower data quality and more substantial linkage errors, such as sub-Saharan Africa, remains unknown

Methods
Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call