Abstract

Electronic health records are increasingly used for research. The definition of cases or endpoints often relies on the use of coded diagnostic data, using a pre-selected group of codes. Validation of these cases, as ‘true’ cases of the disease, is crucial. There are, however, ambiguities in what is meant by validation in the context of electronic records. Validation usually implies comparison of a definition against a gold standard of diagnosis and the ability to identify false negatives (‘true’ cases which were not detected) as well as false positives (detected cases which did not have the condition). We argue that two separate concepts of validation are often conflated in existing studies. Firstly, whether the GP thought the patient was suffering from a particular condition (which we term confirmation or internal validation) and secondly, whether the patient really had the condition (external validation). Few studies have the ability to detect false negatives who have not received a diagnostic code. Natural language processing is likely to open up the use of free text within the electronic record which will facilitate both the validation of the coded diagnosis and searching for false negatives. Copyright © 2011 John Wiley & Sons, Ltd.

Highlights

  • Electronic health records (EHRs) offer great potential for research, enabling the rapid identification of patients for inclusion in intervention or observational studies

  • Primary care records in the UK have been computerised for several decades and in the UK electronic records are almost universal in GP practices

  • Most studies that use the GPRD rely on coded diagnoses to identify cases, and related validation studies attempt to show whether cases with diagnostic codes do have that condition

Read more

Summary

Introduction

Electronic health records (EHRs) offer great potential for research, enabling the rapid identification of patients for inclusion in intervention or observational studies. Research studies find it difficult to access and use large amounts of free text – due to issues of confidentiality, costs of anonymisation and the need to structure/code the information contained. Most studies that use the GPRD (or most other electronic record systems) rely on coded diagnoses to identify cases, and related validation studies attempt to show whether cases with diagnostic codes do have that condition.

Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call