Abstract

BackgroundResearch using linked routine population-based data collected for non-research purposes has increased in recent years because they are a rich and detailed source of information. The objective of this study is to present an approach to prepare and link data from administrative sources in a middle-income country, to estimate its quality and to identify potential sources of bias by comparing linked and non-linked individuals.MethodsWe linked two administrative datasets with data covering the period 2001 to 2015, using maternal attributes (name, age, date of birth, and municipally of residence) from Brazil: live birth information system and the 100 Million Brazilian Cohort (created using administrative records from over 114 million individuals whose families applied for social assistance via the Unified Register for Social Programmes) implementing an in house developed linkage tool CIDACS-RL. We then estimated the proportion of highly probably link and examined the characteristics of missed-matches to identify any potential source of bias.ResultsA total of 27,699,891 live births were submited to linkage with maternal information recorded in the baseline of the 100 Million Brazilian Cohort dataset of those, 16,447,414 (59.4%) children were found registered in the 100 Million Brazilian Cohort dataset. The proportion of highly probably link ranged from 39.3% in 2001 to 82.1% in 2014. A substantial improvement in the linkage after the introduction of maternal date of birth attribute, in 2011, was observed. Our analyses indicated a slightly higher proportion of missing data among missed matches and a higher proportion of people living in an urban area and self-declared as Caucasian among linked pairs when compared with non-linked sets.DiscussionWe demonstrated that CIDACS-RL is capable of performing high quality linkage even with a limited number of common attributes, using indexation as a blocking strategy in larg e routine databases from a middle-income country. However, residual records occurred more among people under worse living conditions. The results presented in this study reinforce the need of evaluating linkage quality and when necessary to take linkage error into account for the analyses of any generated dataset.

Highlights

  • Research using linked routine population-based data collected for non-research purposes has increased in recent years because they are a rich and detailed source of information

  • This study presents an approach to prepare and link data from administrative sources in a middleincome country, estimating the proportion of births for which you were able to identify a link based on a specified threshold and identifying potential sources of bias by comparing link and no-links

  • We demonstrated that CIDACS-RL is capable of performing high quality linkage even with a limited number of common attributes, using indexation as a blocking strategy in a large routine dataset from a middle-income country

Read more

Summary

Introduction

Research using linked routine population-based data collected for non-research purposes has increased in recent years because they are a rich and detailed source of information. Research using routine population-based data collected for social, financial, and clinical purposes has increased in recent years because they are a rich and detailed source of information available at a relatively low cost [1]. Probabilistic record linkage solutions are suitable when there is not a shared key to identify unequivocally an individual across disparate data sources [7, 8] This situation is frequent in different countries, in particular in low and middle-income ones. To perform this procedure, we have to submit the most reliable and discriminative variables present in both databases to calculate similarity scores representing the likelihood that two records belong to the same person. The choice of threshold needs to balance the risk of “false-matches” (records from different individuals that are mistakenly linked) and “missed-matches” (records from the same individual that fail to link) [9]

Objectives
Methods
Results
Discussion
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.