Abstract

A Full Blood Count (FBC) is a common blood test including 20 parameters, such as haemoglobin and platelets. FBCs from Electronic Health Record (EHR) databases provide a large sample of anonymised individual patient data and are increasingly used in research. We describe the quality of the FBC data in one EHR. The Test dataset from the Clinical Research Practice Datalink (CPRD) was accessed, which contains results of tests performed in primary care, such as FBC blood tests. Medical codes and entity codes, two coding systems used within CPRD to identify FBC records, were compared, with levels of mismatched coding, and number that could be rectified reported. The reliability of units of measurement are also described and missing data discussed. There were 14 entity codes and 138 medical codes for the FBC in the data. Medical and entity codes consistently corresponded to the same FBC parameter in 95.2% (n = 217,752,448) of parameters. In the 4.8% (n = 10,955,006) mismatches, the most common parameter rectified was mean platelet volume (n = 2,041,360) and 1,191,540 could not be rectified and were removed. Units of measurement were often either missing, partially entered, or did not appear to correspond to the blood value. The final dataset contained 16,537,017 FBC tests. Applying mathematical equations to derive some missing parameters in these FBCs resulted in 15 of 20 parameters available per FBC on average, with 0.3% of FBCs having all 20 parameters. Performing data quality checks can help to understand the extent of any issues in the dataset. We emphasise balancing large sample sizes with reliability of the data.

Highlights

  • Electronic Health Records (EHR) are databases that store routinely-collected anonymised individual patient data

  • As laboratory data are frequently used across medical research, we provide recommendations and guidance for researchers who wish to access and analyse EHR data in the future, and make available our statistical coding used to perform the data validation of Clinical Practice Research Datalink (CPRD), which other researchers can make use of

  • Full Blood Count (FBC)‐related codes In total, there were 325 different entity codes and 10,963 different medical codes used in the Test dataset

Read more

Summary

Introduction

Electronic Health Records (EHR) are databases that store routinely-collected anonymised individual patient data. These databases were set up to aid patient care and monitor clinical services. Their use for research is a secondary development, which has allowed researchers to conduct large-scale medical studies and perform retrospective. Under the GP Systems of Choice (GPSoC) framework [1], primary care practices use an EHR software system that best suits their data management needs. The Clinical Practice Research Datalink (CPRD) [2] collects patient records from contributing GP practices in the UK using the Vision and, more recently, EMIS software systems

Objectives
Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call