Abstract

To overcome the limitations of relying on data from a single institution, many researchers have studied data linkage methodologies. Data linkage includes errors owing to legal issues surrounding personal information and technical issues related to data processing. Linkage errors affect selection bias, and external and internal validity. Therefore, quality verification for each connection method with adherence to personal information protection is an important issue. This study evaluated the linkage quality of linked data and analyzed the potential bias resulting from linkage errors. This study analyzed claims data submitted to the Health Insurance Review and Assessment Service (HIRA DATA). The linkage errors of the two deterministic linkage methods were evaluated based on the use of the match key. The first deterministic linkage uses a unique identification number, and the second deterministic linkage uses the name, gender, and date of birth as a set of partial identifiers. The linkage error included in this deterministic linkage method was compared with the absolute standardized difference (ASD) of Cohen's according to the baseline characteristics, and the linkage quality was evaluated through the following indicators: linked rate, false match rate, missed match rate, positive predictive value, sensitivity, specificity, and F1-score. For the deterministic linkage method that used the name, gender, and date of birth as a set of partial identifiers, the true match rate was 83.5 and the missed match rate was 16.5. Although there was bias in some characteristics of the data, most of the ASD values were less than 0.1, with no case greater than 0.5. Therefore, it is difficult to determine whether linked data constructed with deterministic linkages have substantial differences. This study confirms the possibility of building health and medical data at the national level as the first data linkage quality verification study using big data from the HIRA. Analyzing the quality of linkages is crucial for comprehending linkage errors and generating reliable analytical outcomes. Linkers should increase the reliability of linked data by providing linkage error-related information to researchers. The results of this study will serve as reference data to increase the reliability of multicenter data linkage studies.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call