Automated data cleaning of paediatric anthropometric data from longitudinal electronic health records: protocol and application to a large patient cohort

Hang T T Phan,Florina Borca,David Cable,James Batchelor,Sarah Ennis,Justin H Davies

doi:10.1038/s41598-020-66925-7

Abstract

‘Big data’ in healthcare encompass measurements collated from multiple sources with various degrees of data quality. These data require quality control assessment to optimise quality for clinical management and for robust large-scale data analysis in healthcare research. Height and weight data represent one of the most abundantly recorded health statistics. The shift to electronic recording of anthropometric measurements in electronic healthcare records, has rapidly inflated the number of measurements. WHO guidelines inform removal of population-based extreme outliers but an absence of tools limits cleaning of longitudinal anthropometric measurements. We developed and optimised a protocol for cleaning paediatric height and weight data that incorporates outlier detection using robust linear regression methodology using a manually curated set of 6,279 patients’ longitudinal measurements. The protocol was then applied to a cohort of 200,000 patient records collected from 60,000 paediatric patients attending a regional teaching hospital in South England. WHO guidelines detected biologically implausible data in <1% of records. Additional error rates of 3% and 0.2% for height and weight respectively were detected using the protocol. Inflated error rates for height measurements were largely due to small but physiologically implausible decreases in height. Lowest error rates were observed when data was measured and digitally recorded by staff routinely required to do so. The protocol successfully automates the parsing of implausible and poor quality height and weight data from a voluminous longitudinal dataset and standardises the quality assessment of data for clinical and research applications.

Highlights

‘Big data’ in healthcare encompass measurements collated from multiple sources with various degrees of data quality
The ‘gold-standard’ University Hospital Southampton (UHS) height and weight dataset enabled assessment of true data quality. Both height and weight measurements across the 2008–2018 were stable with an error rate of ~3% for height and 0.2% for weight (Fig. 1)
This study presents a bespoke protocol for automated anthropometric data cleaning that has been tested across a sizeable dataset captured from a regional teaching hospital in South England

Summary

Introduction

‘Big data’ in healthcare encompass measurements collated from multiple sources with various degrees of data quality. Others[10] have suggested removing weight measurements where annual changes exceed 22.7 kg or 27.2 kg if the individual was severely obese at baseline, any height decrease and any height increase >15 cm a year These methods were developed for identifying extreme changes in periodical measurements and do not detect less extreme changes and so are not applicable to children where growth is dynamic. Anthropometric data on children has been systematically recorded, improving the accuracy of growth data presentation on a growth chart and enhancing the experience of sharing growth data by clinicians between paediatric specialities It has presented an opportunity for research studies to use longitudinal routine patient care anthropometric data and make correlations between childhood growth and development of disease or efficacy of therapy. It is necessary that the anthropometric data be cleaned and processed before it is used for research purposes

Methods

Results

Conclusion

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Scientific Reports	Publication Date: Jun 23, 2020
Citations: 22	License type: open-access

R Discovery Prime

R Discovery Prime

Automated data cleaning of paediatric anthropometric data from longitudinal electronic health records: protocol and application to a large patient cohort

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Scientific Reports

Lead the way for us

Similar Papers

Cleaning of anthropometric data from PCORnet electronic health records using automated algorithms.
Pi-I D Lin ... Matthew F Daley
JAMIA Open | VOL. 5
Pi-I D Lin, et. al.Pi-I D Lin ... Matthew F Daley
04 Oct 2022
JAMIA Open | VOL. 5

VHA Corporate Data Warehouse height and weight data: Opportunities and challenges for health services research
Polly Hitchcock Nol ... Chen-Pin Wang
The Journal of Rehabilitation Research and Development | VOL. 47
Polly Hitchcock Nol, et. al.Polly Hitchcock Nol ... Chen-Pin Wang
01 Jan 2009
The Journal of Rehabilitation Research and Development | VOL. 47

Indice de massa corporal: sensibilidade e especificidade.
...
Acta Médica Portuguesa | VOL. 17
, et. al. ...
20 Dec 2004
Acta Médica Portuguesa | VOL. 17

Early diagnosis of cystic fibrosis through neonatal screening prevents severe malnutrition and improves long-term growth. Wisconsin Cystic Fibrosis Neonatal Screening Study Group.
Philip M Farrell ... Anita Laxova
Pediatrics | VOL. 107
Philip M Farrell, et. al.Philip M Farrell ... Anita Laxova
01 Jan 2001
Pediatrics | VOL. 107

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Automated data cleaning of paediatric anthropometric data from longitudinal electronic health records: protocol and application to a large patient cohort

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Scientific Reports