Abstract

BackgroundHospital patient registries provide substantial longitudinal data sets describing the clinical and medical health statuses of inpatients and their pharmacological prescriptions. Despite the multiple advantages of routinely collecting multidimensional longitudinal data, those data sets are rarely suitable for advanced statistical analysis and they require customization and synthesis.ObjectiveThe aim of this study was to describe the methods used to transform and synthesize a raw, multidimensional, hospital patient registry data set into an exploitable database for the further investigation of risk profiles and predictive and survival health outcomes among polymorbid, polymedicated, older inpatients in relation to their medicine prescriptions at hospital discharge.MethodsA raw, multidimensional data set from a public hospital was extracted from the hospital registry in a CSV (.csv) file and imported into the R statistical package for cleaning, customization, and synthesis. Patients fulfilling the criteria for inclusion were home-dwelling, polymedicated, older adults with multiple chronic conditions aged ≥65 who became hospitalized. The patient data set covered 140 variables from 20,422 hospitalizations of polymedicated, home-dwelling older adults from 2015 to 2018. Each variable, according to type, was explored and computed to describe distributions, missing values, and associations. Different clustering methods, expert opinion, recoding, and missing-value techniques were used to customize and synthesize these multidimensional data sets.ResultsSociodemographic data showed no missing values. Average age, hospital length of stay, and frequency of hospitalization were computed. Discharge details were recoded and summarized. Clinical data were cleaned up and best practices for managing missing values were applied. Seven clusters of medical diagnoses, surgical interventions, somatic, cognitive, and medicines data were extracted using empirical and statistical best practices, with each presenting the health status of the patients included in it as accurately as possible. Medical, comorbidity, and drug data were recoded and summarized.ConclusionsA cleaner, better-structured data set was obtained, combining empirical and best-practice statistical approaches. The overall strategy delivered an exploitable, population-based database suitable for an advanced analysis of the descriptive, predictive, and survival statistics relating to polymedicated, home-dwelling older adults admitted as inpatients. More research is needed to develop best practices for customizing and synthesizing large, multidimensional, population-based registries.International Registered Report Identifier (IRRID)RR2-10.1136/bmjopen-2019-030030

Highlights

  • The transition from paper-based patient records to electronic health records has provided unprecedented access to vast amounts of diverse clinical and health data at the point of care [1]

  • The overall strategy delivered an exploitable, population-based database suitable for an advanced analysis of the descriptive, predictive, and survival statistics relating to polymedicated, home-dwelling older adults admitted as inpatients

  • This paper aims to describe the method used to transform and synthesize a raw, multidimensional, patient registry data set to prepare it for exploitation as a database with which to examine predictive and survival analysis among hospitalized older inpatients

Read more

Summary

Introduction

The transition from paper-based patient records to electronic health records has provided unprecedented access to vast amounts of diverse clinical and health data at the point of care [1]. This transition offers a huge opportunity to exploit patient registries for scientific, clinical, and health-policy purposes. Several national registries (eg, the National Committee on Vital and Health Statistics, or the Agency for Healthcare Research and Quality, both in the United States) are used for a broad range of purposes in public health and medicine as part of “an organized system for the collection, storage, retrieval, analysis, and dissemination of information on individual persons who have either a particular disease, a condition (eg, a risk factor) that predisposes the occurrence of a health-related event, or prior exposure to substances (or circumstances) known or suspected to cause adverse health effects” [1]. Despite the multiple advantages of routinely collecting multidimensional longitudinal data, those data sets are rarely suitable for advanced statistical analysis and they require customization and synthesis

Methods
Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call