Abstract

BackgroundElectronic health records are a valuable asset for research, but their use is challenging due to inconsistencies of records, heterogeneous formats and the distribution over multiple, non-integrated information systems. Hence, specialized health data engineering and data science expertise are required to enable research. To facilitate secondary use of clinical routine data collected in our intensive care wards, we developed a scalable approach, consisting of cohort generation, variable filtering and data extraction steps. ObjectiveWith this report we share our workflow of data request, cohort identification and data extraction. We present an algorithm for automatic data extraction from our critical care information system (CCIS) that can be adapted to other object-oriented data bases. MethodsWe introduced a data request process with functionalities for automated identification of patient cohorts and a specialized hierarchical data structure that supports filtering relevant variables from the CCIS and further systems for the specified cohorts. The data extraction algorithm takes patient pseudonyms and variable lists as inputs. Algorithms are implemented in Python, leveraging the PySpark framework running on our data lake infrastructure. ResultsOur data request process is in operational use since June 2022. Since then we have served 121 projects with 148 service requests in total. We discuss the hierarchical structure and the frequently used data items of our CCIS in detail and present an application example, including cohort selection, data extraction and data transformation into an analyses-ready format. ConclusionsUsing clinical routine data for secondary research is challenging and requires an interdisciplinary team. We developed a scalable approach that automates steps for cohort identification, data extraction and common data pre-processing steps. Additionally, we facilitate data harmonization, integration and consult on typical data analysis scenarios, machine learning algorithms and visualizations in dashboards.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.