Abstract

BackgroundMethods for linking real-world healthcare data often use a latent class model, where the latent, or unknown, class is the true match status of candidate record-pairs. This commonly used model assumes that agreement patterns among multiple fields within a latent class are independent. When this assumption is violated, various approaches, including the most commonly proposed loglinear models, have been suggested to account for conditional dependence.MethodsWe present a step-by-step guide to identify important dependencies between fields through a correlation residual plot and demonstrate how they can be incorporated into loglinear models for record linkage. This method is applied to healthcare data from the patient registry for a large county health department.ResultsOur method could be readily implemented using standard software (with code supplied) to produce an overall better model fit as measured by BIC and deviance. Finding the most parsimonious model is known to reduce bias in parameter estimates.ConclusionsThis novel approach identifies and accommodates conditional dependence in the context of record linkage. The conditional dependence model is recommended for routine use due to its flexibility for incorporating conditional dependence and easy implementation using existing software.

Highlights

  • Methods for linking real-world healthcare data often use a latent class model, where the latent, or unknown, class is the true match status of candidate record-pairs

  • Marion County Health Department (MCHD) is a member of the Indiana Network for Patient Care, the nation’s largest and longest tenured HIE [24]

  • The MCHD client registry contains 779,466 patient records gathered from multiple public health service areas

Read more

Summary

Introduction

Methods for linking real-world healthcare data often use a latent class model, where the latent, or unknown, class is the true match status of candidate record-pairs This commonly used model assumes that agreement patterns among multiple fields within a latent class are independent. Deterministic approaches are based on ad-hoc rules, which classify a pair of records as matches if the two Distance-based methods that can handle numerical or categorical fields, as described in [3], are another method to link records These methods have been shown to perform to probabilistic methods for both numeric [4] and categorical data [5] but require one to establish appropriate distance measures for each variable under consideration. They are not investigated further here as they are not commonly used in practice and have not yet been investigated thoroughly in the HIE setting they may be of interest in future work [6]

Methods
Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call