Abstract

BackgroundNatural human languages show a power law behaviour in which word frequency (in any large enough corpus) is inversely proportional to word rank - Zipf’s law. We have therefore asked whether similar power law behaviours could be seen in data from electronic patient records.ResultsIn order to examine this question, anonymised data were obtained from all general practices in Salford covering a seven year period and captured in the form of Read codes. It was found that data for patient diagnoses and procedures followed Zipf’s law. However, the medication data behaved very differently, looking much more like a referential index. We also observed differences in the statistical behaviour of the language used to describe patient diagnosis as a function of an anonymised GP practice identifier.ConclusionsThis works demonstrate that data from electronic patient records does follow Zipf’s law. We also found significant differences in Zipf’s law behaviour in data from different GP practices. This suggests that computational linguistic techniques could become a useful additional tool to help understand and monitor the data quality of health records.

Highlights

  • Natural human languages show a power law behaviour in which word frequency is inversely proportional to word rank - Zipf’s law

  • A recent survey has shown that 90% of patient contact with the National Health Service (NHS) in the UK is through General Practices and General Practitioners (GPs) [1]

  • Over 98% of the UK population is registered with a general practitioner and almost all GPs use computerised patient record systems, providing a unique and valuable resource of data [2]

Read more

Summary

Introduction

Natural human languages show a power law behaviour in which word frequency (in any large enough corpus) is inversely proportional to word rank - Zipf’s law. Over 98% of the UK population is registered with a general practitioner and almost all GPs use computerised patient record systems, providing a unique and valuable resource of data [2]. Clinical terminologies are required by electronic patient record systems to capture, process, use, transfer and share data in a standard form [4] by providing a mechanism to encode patient data in a structured and common language [5]. This standard language helps improve sharing and communication of information throughout the health system and beyond [6,7]. Codes assigned to patient encounters with the health system can be used for many purposes such as automated medical decision support, disease surveillance, payment and reimbursement of

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call