ETL Process Research Articles

Background: Retrospective research on real-world data provides the ability to gain evidence on specific topics especially when running across different sites in research networks. Those research networks have become increasingly relevant in recent years; not least due to the special situation caused by the COVID-19 pandemic. An important requirement for those networks is the data harmonization by ensuring the semantic interoperability. Aims: In this paper we demonstrate (1) how to facilitate digital infrastructures to run a retrospective study in a research network spread across university and non-university hospital sites; and (2) to answer a medical question on COVID-19 related change in diagnostic counts for diabetes-related eye diseases. Materials and methods: The study is retrospective and non-interventional and runs on medical case data documented in routine care at the participating sites. The technical infrastructure consists of the OMOP CDM and other OHDSI tools that is provided in a transferable format. An ETL process to transfer and harmonize the data to the OMOP CDM has been utilized. Cohort definitions for each year in observation have been created centrally and applied locally against medical case data of all participating sites and analyzed with descriptive statistics. Results: The analyses showed an expectable drop of the total number of diagnoses and the diagnoses for diabetes in general; whereas the number of diagnoses for diabetes-related eye diseases surprisingly decreased stronger compared to non-eye diseases. Differences in relative changes of diagnoses counts between sites show an urgent need to process multi-centric studies rather than single-site studies to reduce bias in the data. Conclusions: This study has demonstrated the ability to utilize an existing portable and standardized infrastructure and ETL process from a university hospital setting and transfer it to non-university sites. From a medical perspective further activity is needed to evaluate data quality of the utilized real-world data documented in routine care and to investigate its eligibility of this data for research.

Read full abstract

ObjectiveThe large-scale collection of observational data and digital technologies could help curb the COVID-19 pandemic. However, the coexistence of multiple Common Data Models (CDMs) and the lack of data extract, transform, and load (ETL) tool between different CDMs causes potential interoperability issue between different data systems. The objective of this study is to design, develop, and evaluate an ETL tool that transforms the PCORnet CDM format data into the OMOP CDM. MethodsWe developed an open-source ETL tool to facilitate the data conversion from the PCORnet CDM and the OMOP CDM. The ETL tool was evaluated using a dataset with 1000 patients randomly selected from the PCORnet CDM at Mayo Clinic. Information loss, data mapping accuracy, and gap analysis approaches were conducted to assess the performance of the ETL tool. We designed an experiment to conduct a real-world COVID-19 surveillance task to assess the feasibility of the ETL tool. We also assessed the capacity of the ETL tool for the COVID-19 data surveillance using data collection criteria of the MN EHR Consortium COVID-19 project. ResultsAfter the ETL process, all the records of 1000 patients from 18 PCORnet CDM tables were successfully transformed into 12 OMOP CDM tables. The information loss for all the concept mapping was less than 0.61%. The string mapping process for the unit concepts lost 2.84% records. Almost all the fields in the manual mapping process achieved 0% information loss, except the specialty concept mapping. Moreover, the mapping accuracy for all the fields were 100%. The COVID-19 surveillance task collected almost the same set of cases (99.3% overlaps) from the original PCORnet CDM and target OMOP CDM separately. Finally, all the data elements for MN EHR Consortium COVID-19 project could be captured from both the PCORnet CDM and the OMOP CDM. ConclusionWe demonstrated that our ETL tool could satisfy the data conversion requirements between the PCORnet CDM and the OMOP CDM. The outcome of the work would facilitate the data retrieval, communication, sharing, and analysis between different institutions for not only COVID-19 related project, but also other real-world evidence-based observational studies.

Read full abstract

ETL Process Research Articles

Related Topics

Articles published on ETL Process

Data Warehousing Process Modeling from Classical Approaches to New Trends: Main Features and Comparisons

Big Data ETL Process and Its Impact on Text Mining Analysis for Employees’ Reviews

Study of ETL Process and Its Testing Techniques

How to Optimize Connection Between PACS and Clinical Data Warehouse: A Web Service Approach Based on Full Metadata Integration.

Opportunities of Digital Infrastructures for Disease Management-Exemplified on COVID-19-Related Change in Diagnosis Counts for Diabetes-Related Eye Diseases.

Dynamic multi-variant relational scheme-based intelligent ETL framework for healthcare management.

OVERVIEW OF OPTIMIZATION METHODS FOR PRODUCTIVITY OF THE ETL PROCESS

Automated data extraction into acataract surgery registry : Automatic investigation of results in the registry on ophthalmological analysis (ROPHA)

Developing an ETL tool for converting the PCORnet CDM into the OMOP CDM to facilitate the COVID-19 data integration

Towards Harmonized Data Quality in the Medical Informatics Initiative - Current State and Future Directions.

Predictive models assessment based on CRISP-DM methodology for students performance in Colombia - Saber 11 Test

Managing vulnerabilities during the development of a secure ETL processes

Managing vulnerabilities during the development of a secure ETL processes

Analisis Dynamic ETL Incremental Load untuk Data Integration Datawarehouse

OVERVIEW OF OPTIMIZATION METHODS FOR PRODUCTIVITY OF THE ETL PROCES

Dashboard Business Intelligence Vusialisasi Data Akreditasi Sekolah Pada SMP Negeri 1 Sembawa

Making EHRs Reusable: A Common Framework of Data Operations.

Implementasi Dashboard Untuk Visualisasi Data Penerimaan Mahasiswa Baru Studi Kasus : Universitas Kristen Duta Wacana

Building a federated research infrastructure for a policy-rapid response

MDA Approach for Designing and Developing Data Warehouses: A Systematic Review & Proposal

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

ETL Process Research Articles

Related Topics

Articles published on ETL Process

Data Warehousing Process Modeling from Classical Approaches to New Trends: Main Features and Comparisons

Big Data ETL Process and Its Impact on Text Mining Analysis for Employees’ Reviews

Study of ETL Process and Its Testing Techniques

How to Optimize Connection Between PACS and Clinical Data Warehouse: A Web Service Approach Based on Full Metadata Integration.

Opportunities of Digital Infrastructures for Disease Management-Exemplified on COVID-19-Related Change in Diagnosis Counts for Diabetes-Related Eye Diseases.

Dynamic multi-variant relational scheme-based intelligent ETL framework for healthcare management.

OVERVIEW OF OPTIMIZATION METHODS FOR PRODUCTIVITY OF THE ETL PROCESS

Automated data extraction into acataract surgery registry : Automatic investigation of results in the registry on ophthalmological analysis (ROPHA)

Developing an ETL tool for converting the PCORnet CDM into the OMOP CDM to facilitate the COVID-19 data integration

Towards Harmonized Data Quality in the Medical Informatics Initiative - Current State and Future Directions.

Predictive models assessment based on CRISP-DM methodology for students performance in Colombia - Saber 11 Test

Managing vulnerabilities during the development of a secure ETL processes

Managing vulnerabilities during the development of a secure ETL processes

Analisis Dynamic ETL Incremental Load untuk Data Integration Datawarehouse

OVERVIEW OF OPTIMIZATION METHODS FOR PRODUCTIVITY OF THE ETL PROCES

Dashboard Business Intelligence Vusialisasi Data Akreditasi Sekolah Pada SMP Negeri 1 Sembawa

Making EHRs Reusable: A Common Framework of Data Operations.

Implementasi Dashboard Untuk Visualisasi Data Penerimaan Mahasiswa Baru Studi Kasus : Universitas Kristen Duta Wacana

Building a federated research infrastructure for a policy-rapid response

MDA Approach for Designing and Developing Data Warehouses: A Systematic Review &amp; Proposal

MDA Approach for Designing and Developing Data Warehouses: A Systematic Review & Proposal