Abstract

NJK Cochrane, RB Hubbard, JE Gibson Division of Epidemiology & Public Health, School of Medicine, University of Nottingham, Nottingham, United Kingdom Contact: mcxnc5@nottingham.ac.uk Background Analyses of electronic medical records (EMRs) are widespread in public health research. Few such analyses study dose dependence as prescribed dosages and frequencies of medication use are typically recorded as unstructured text and are difficult to work with. We therefore developed an algorithm, based on n-gram sequencing, to automatically convert text instructions to numeric daily dosages and validated it using a large database of UK primary care EMRs. Methods The 26,000 most frequently occurring dosage instructions were identified from all prescriptions recorded in The Health Improvement Network (THIN). The first 5,000 of these were sequenced to identify and encode common phrases. An algorithm was developed which used these encoded phrases to estimate daily prescribed dosages among all 26,000 instructions. The resulting quantities were manually validated to determine the accuracy of the method. Results Of the 26,000 common instructions, the algorithm correctly interpreted 22,285 (86%) of these, with 18,575 (88%) found to be correctly determined on inspection. Among the instructions not included in the training data, there was no evidence of a decline in accuracy among those which occurred less frequently. Of 945 million prescriptions recorded in THIN prior to September 2012, 826 m (82%) contained one of the 26,000 instructions tested. Of these, 817 m (99%) instructions were correctly interpreted, resulting in 680 m (82%) useable values after excluding instructions which did not specify a dosage. Conclusions Dosage instructions from prescription records can be accurately converted to numeric quantities by automated means. Integration of this work into large databases of EMRs can strengthen the results of pharmacoepidemiological studies using these resources. Key messages Information contained within free text can be difficult to quantify without resorting to time consuming manual coding. This work shows that a semi-automated approach can be achieved. If automated methods can be developed they have great potential for reuse and enhancement.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.