Abstract

IntroductionThe challenges in identifying a cohort of people with a rare condition can be addressed by routinely collected, population-scale electronic health record (EHR) data, which provide large volumes of data at a national level. This paper describes the challenges of accurately identifying a cohort of children with Cystic Fibrosis (CF) using EHR and their validation against the UK CF Registry.ObjectivesTo establish a proof of principle and provide insight into the merits of linked data in CF research; to identify the benefits of access to multiple data sources, in particular the UK CF Registry data, and to demonstrate the opportunity it represents as a resource for future CF research.MethodsThree EHR data sources were used to identify children with CF born in Wales between 1st January 1998 and 31st August 2015 within the Secure Anonymised Information Linkage (SAIL) Databank. The UK CF Registry was later acquired by SAIL and linked to the EHR cohort to validate the cases and explore the reasons for misclassifications.ResultsWe identified 352 children with CF in the three EHR data sources. This was greater than expected based on historical incidence rates in Wales. Subsequent validation using the UK CF Registry found that 257 (73%) of these were true cases. Approximately 98.7% (156/158) of individuals identified as CF cases in all three EHR data sources were confirmed as true cases; but this was only the case for 19.8% (20/101) of all those identified in just a single data source. ConclusionIdentifying health conditions in EHR data can be challenging, so data quality assurance and validation is important or the merit of the research is undermined. This retrospective review identifies some of the challenges in identifying CF cases and demonstrates the benefits of linking cases across multiple data sources to improve quality.

Highlights

  • The challenges in identifying a cohort of people with a rare condition can be addressed by routinely collected, population-scale electronic health record (EHR) data, which provide large volumes of data at a national level

  • After matching the children on unique Anonymised Linkage Field (ALF) across all three data sources, we found a total of 352 unique children with Cystic Fibrosis (CF) diagnostic codes born between January 1998 and August 2015

  • As CF is a serious condition, often picked up at birth and likely to prompt frequent visits to a General Practitioner (GP) and possibly hospital admissions, there was an expectation that any child with CF would have records in multiple data sources, and the most likely true cases would be those with records in all three data sources

Read more

Summary

Introduction

The challenges in identifying a cohort of people with a rare condition can be addressed by routinely collected, population-scale electronic health record (EHR) data, which provide large volumes of data at a national level. There is increasing interest in using routinely collected, population-scale, electronic health record (EHR) administrative data, in order to conduct research At population level these data sources can provide large volumes of data and a broad longitudinal view of a condition over a period of time [1,2,3]. In Wales, the Secure Anonymised Information Linkage (SAIL) Databank is the national repository of anonymised, person based, linkable data [4,5]. It holds routine EHR data from primary and secondary health care for about 5 million people from the turn of the century, including the active population of about 3.1 million people. Open Access under CC BY 4.0 (https://creativecommons.org/licenses/by/4.0/deed.en)

Objectives
Methods
Results
Discussion
Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.