Abstract
BackgroundTemporal variability in health-care processes or protocols is intrinsic to medicine. Such variability can potentially introduce dataset shifts, a data quality issue when reusing electronic health records (EHRs) for secondary purposes. Temporal data-set shifts can present as trends, as well as abrupt or seasonal changes in the statistical distributions of data over time. The latter are particularly complicated to address in multimodal and highly coded data. These changes, if not delineated, can harm population and data-driven research, such as machine learning. Given that biomedical research repositories are increasingly being populated with large sets of historical data from EHRs, there is a need for specific software methods to help delineate temporal data-set shifts to ensure reliable data reuse.ResultsEHRtemporalVariability is an open-source R package and Shiny app designed to explore and identify temporal data-set shifts. EHRtemporalVariability estimates the statistical distributions of coded and numerical data over time; projects their temporal evolution through non-parametric information geometric temporal plots; and enables the exploration of changes in variables through data temporal heat maps. We demonstrate the capability of EHRtemporalVariability to delineate data-set shifts in three impact case studies, one of which is available for reproducibility.ConclusionsEHRtemporalVariability enables the exploration and identification of data-set shifts, contributing to the broad examination and repurposing of large, longitudinal data sets. Our goal is to help ensure reliable data reuse for a wide range of biomedical data users. EHRtemporalVariability is designed for technical users who are programmatically utilizing the R package, as well as users who are not familiar with programming via the Shiny user interface.Availability: https://github.com/hms-dbmi/EHRtemporalVariability/Reproducible vignette: https://cran.r-project.org/web/packages/EHRtemporalVariability/vignettes/EHRtemporalVariability.htmlOnline demo: http://ehrtemporalvariability.upv.es/
Highlights
Temporal variability in healthcare processes or protocols is intrinsic to medicine
EHRtemporalVariability is suited to technical users programmatically using the R-package and to those users not familiar with programming using the Shiny user interface
EHRtemporalVariability is based on the probabilistic temporal variability methods that we developed and validated over 5 years,[6,9,11] namely Information-Geometric-Temporal (IGT) plots and Data Temporal Heatmaps (DTHs)
Summary
Temporal variability in healthcare processes or protocols is intrinsic to medicine. Such variability can potentially introduce dataset shifts, a data quality issue when reusing electronic health records (EHRs) for secondary purposes. The widespread adoption of data-sharing technologies, health information standards, and open-data initiatives are inspiring the creation of research data repositories with large-scale historical data from EHRs1 This represents a new class of longitudinal, real-world data, defined as large datasets collected over time from sources outside of clinical trials or specific research cohorts. Reuse of this data, ranging from clinical observations to molecular information, has begun to boost the efficacy and generalization of biomedical and clinical research. Clinical care processes and their local variations are permeated with a variety of batch effects and biases.[5,6,7,8,9] This is similar to the situation in genomics and other “omics” research, where batch effects can be introduced by technical sources of variation that have been added to samples during acquisition handling.[15,16]
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.