Procedures of data merging in precision cancer medicine: the PRIME-ROSE project
Background and purposeAs more interventional clinical trials in Precision Cancer Medicine (PCM) are introduced, molecular descriptions of tumours have led to multiple subtypes, even within common tumour types. Therefore, the main limitation of these trials is the small number of eligible patients to assess the clinical benefit. The PRIME-ROSE project addresses this limitation by pooling data from multiple European Drug Rediscovery Protocol (DRUP)-like clinical trials, such that slowly accruing cohorts are accelerated. To achieve this task, a well-documented commonly approved procedure for data merging needs to be established.Patient/material and methodsData sharing is achievable when there is an organisation that includes people from different disciplines who can navigate institutional and country-specific information and governance requirements. Furthermore, alignment of all the study procedures are needed before data are shared. Next, the process of merging data requires harmonisation and standardisation. Implementation of the Observational Medical Outcomes Partnership (OMOP) Common Data Model (CDM) facilitates future data aggregation.ResultsBy aggregating data from European DRUP-like clinical trials, cohorts are completed that were unable to do so in stand-alone studies. Since initiation, the PRIME-ROSE project monitors over 300 cohorts across more than 20 treatments encompassing over 1,000 patients. At least 20 cohorts have progressed after interim analysis.InterpretationData sharing across European trials is feasible and enhances the advancements of PCM studies. The methodologies developed in the PRIME-ROSE project provide a foundation for future data integration efforts in PCM clinical trials, underscoring the viability of conducting robust trials in a global context.
- Research Article
1
- 10.1093/jamiaopen/ooaf052
- Jul 3, 2025
- JAMIA Open
ObjectiveFederated analysis is a method that allows data analysis to be performed on similar datasets without exchanging any data, thus facilitating international research collaboration while adhering to strict privacy laws. This study aimed to evaluate the feasibility of using federated analysis to benchmark mortality in 2 critical care quality registry databases converted to the Observational Medical Outcomes Partnership (OMOP) Common Data Model (CDM), describing challenges to and recommendations for performing federated analysis on data transformed to OMOP CDM.Materials and MethodsTo identify as many challenges as possible and to be able to complete the benchmarking phase, a 2-step approach was taken during implementation. The first step was a naive implementation to allow challenges to surface naturally; the second step was developing solutions for the encountered challenges. Expected patient mortality risk was calculated by applying the Acute Physiology and Chronic Health Evaluation II (APACHE II) model to data from OMOP CDM databases containing adult ICU encounters between July 1, 2019 and December 31, 2022. An analysis script was developed to calculate comparable, registry level standardized mortality ratios. Challenges were recorded and categorized into predefined categories: “data preparation,” “data analysis plan,” and “data interpretation.” Challenges specific to the OMOP CDM were further categorized using published steps from an existing generic harmonization process.ResultsA total of 7 challenges were identified, 4 of which were related to data preparation, 1 to data analysis, and 1 to data interpretation. Out of all 7 challenges, 4 stemmed from decisions made during the implementation of OMOP CDM. Several recommended solutions were distilled from the naive approach.DiscussionFederated analysis facilitated by a CDM is a feasible option for critical care quality registries. However, future analysis is influenced by decisions made during the CDM implementation process. Thus, prior publication of data dictionaries and the use of metadata to communicate data handling and data source classification during CDM implementation will improve the efficiency and accuracy of subsequent analysis.
- Research Article
3
- 10.1016/j.compbiomed.2024.108411
- Apr 6, 2024
- Computers in Biology and Medicine
recruIT: A cloud-native clinical trial recruitment support system based on Health Level 7 Fast Healthcare Interoperability Resources (HL7 FHIR) and the Observational Medical Outcomes Partnership Common Data Model (OMOP CDM)
- Research Article
21
- 10.2196/29259
- Oct 29, 2021
- Journal of Medical Internet Research
BackgroundElectronic health records (EHRs, such as those created by an anesthesia management system) generate a large amount of data that can notably be reused for clinical audits and scientific research. The sharing of these data and tools is generally affected by the lack of system interoperability. To overcome these issues, Observational Health Data Sciences and Informatics (OHDSI) developed the Observational Medical Outcomes Partnership (OMOP) common data model (CDM) to standardize EHR data and promote large-scale observational and longitudinal research. Anesthesia data have not previously been mapped into the OMOP CDM.ObjectiveThe primary objective was to transform anesthesia data into the OMOP CDM. The secondary objective was to provide vocabularies, queries, and dashboards that might promote the exploitation and sharing of anesthesia data through the CDM.MethodsUsing our local anesthesia data warehouse, a group of 5 experts from 5 different medical centers identified local concepts related to anesthesia. The concepts were then matched with standard concepts in the OHDSI vocabularies. We performed structural mapping between the design of our local anesthesia data warehouse and the OMOP CDM tables and fields. To validate the implementation of anesthesia data into the OMOP CDM, we developed a set of queries and dashboards.ResultsWe identified 522 concepts related to anesthesia care. They were classified as demographics, units, measurements, operating room steps, drugs, periods of interest, and features. After semantic mapping, 353 (67.7%) of these anesthesia concepts were mapped to OHDSI concepts. Further, 169 (32.3%) concepts related to periods and features were added to the OHDSI vocabularies. Then, 8 OMOP CDM tables were implemented with anesthesia data and 2 new tables (EPISODE and FEATURE) were added to store secondarily computed data. We integrated data from 5,72,609 operations and provided the code for a set of 8 queries and 4 dashboards related to anesthesia care.ConclusionsGeneric data concerning demographics, drugs, units, measurements, and operating room steps were already available in OHDSI vocabularies. However, most of the intraoperative concepts (the duration of specific steps, an episode of hypotension, etc) were not present in OHDSI vocabularies. The OMOP mapping provided here enables anesthesia data reuse.
- Research Article
2
- 10.1093/jamiaopen/ooaf022
- Mar 6, 2025
- JAMIA Open
ObjectivesThis work aims to develop a methodology for transforming Health Level 7 (HL7) Clinical Document Architecture (CDA) documents into the Observational Medical Outcomes Partnership (OMOP) Common Data Model (CDM). The described method seeks to improve the Extract, Transform, Load (ETL) design process by using HL7 CDA Template definitions and the CDA Refined Message Information Model (CDA R-MIM).Material and MethodsOur approach utilizes HL7 CDA Templates to define structural and semantic mappings. Supported by the CDA R-MIM for semantic alignment with the OMOP CDM, we developed a tool named CDA Rabbit that enables the generation of Rabbit-In-a-Hat project files from HL7 CDA Template definitions and could be successfully integrated into the existing toolchain around OMOP.ResultsWe tested our approach using 13 CDA Templates from the Austrian national EHR System (ELGA) and 430 anonymized CDA test documents that were mapped to 10 OMOP CDM tables. The data quality assessment, using OMOP’s DataQualityDashboard, showed a 99% pass rate, indicating a robust and accurate data transformation.ConclusionThis study presents a novel framework for transforming HL7 CDA documents into OMOP CDM using template definitions and CDA R-MIM. The methodology improves semantic interoperability, mapping reusability, and ETL design efficiency. Future work should focus on automating code generation, improving data profiling, and addressing cyclic dependencies within CDA templates. The presented approach supports improved secondary use of health data and research while adhering to standardized data models and semantics.DiscussionUsing CDA Templates for ETL design addresses common ETL challenges, such as data accessibility during ETL design, by decoupling the process from the actual CDA instances. Future work could focus on extending this approach to automatically generate boilerplate code, address cyclic dependencies within CDA Templates, and adapt the method for the use with FHIR profiles.
- Research Article
- 10.1136/bmjno-2025-001202
- Jul 1, 2025
- BMJ Neurology Open
BackgroundThe Observational Medical Outcomes Partnership (OMOP) Common Data Model (CDM) is a standardised framework for organising healthcare data. This study uses data in the OMOP CDM format to analyse information on neurology patients.MethodsRoutinely collected data harmonised to OMOP at a large referral hospital in England were used. A study cohort was defined as patients who attended at least one neurology outpatient appointment between 01 April 2022 and 31 March 2023 (n=23 862). Data collected at all visits to the hospital made by this cohort between 01 April 2021 and 31 March 2024 were extracted. The cohort was then divided into four subcohorts according to appointment types attended: outpatient appointment(s) only (n=15 2); outpatient appointment(s) and inpatient stay(s) (n=2750); outpatient appointment(s) and emergency department attendance(s) (n=1658); outpatient appointment(s), inpatient stay(s) and emergency department attendance(s) (n=4199).ResultsWe found there to be more data available for patients who had at least one inpatient stay or emergency department attendance than for those with only outpatient appointments. Notably, an average of 0 out of 100 patients in the outpatient only subcohort had a record of a condition, compared with 100 out of 100 patients in the subcohort with outpatient appointments, emergency attendances and inpatient stays.ConclusionsNeurology outpatients have far less data recorded than inpatients or patients attending emergency departments. This disparity arises from the lack of outpatient diagnostic coding and impairs the advancement of research in this area. Using the OMOP CDM structure makes it easy to highlight these differences.
- Research Article
5
- 10.2196/59187
- Jul 12, 2024
- JMIR medical informatics
Digital transformation, particularly the integration of medical imaging with clinical data, is vital in personalized medicine. The Observational Medical Outcomes Partnership (OMOP) Common Data Model (CDM) standardizes health data. However, integrating medical imaging remains a challenge. This study proposes a method for combining medical imaging data with the OMOP CDM to improve multimodal research. Our approach included the analysis and selection of digital imaging and communications in medicine header tags, validation of data formats, and alignment according to the OMOP CDM framework. The Fast Healthcare Interoperability Resources ImagingStudy profile guided our consistency in column naming and definitions. Imaging Common Data Model (I-CDM), constructed using the entity-attribute-value model, facilitates scalable and efficient medical imaging data management. For patients with lung cancer diagnosed between 2010 and 2017, we introduced 4 new tables-IMAGING_STUDY, IMAGING_SERIES, IMAGING_ANNOTATION, and FILEPATH-to standardize various imaging-related data and link to clinical data. This framework underscores the effectiveness of I-CDM in enhancing our understanding of lung cancer diagnostics and treatment strategies. The implementation of the I-CDM tables enabled the structured organization of a comprehensive data set, including 282,098 IMAGING_STUDY, 5,674,425 IMAGING_SERIES, and 48,536 IMAGING_ANNOTATION records, illustrating the extensive scope and depth of the approach. A scenario-based analysis using actual data from patients with lung cancer underscored the feasibility of our approach. A data quality check applying 44 specific rules confirmed the high integrity of the constructed data set, with all checks successfully passed, underscoring the reliability of our findings. These findings indicate that I-CDM can improve the integration and analysis of medical imaging and clinical data. By addressing the challenges in data standardization and management, our approach contributes toward enhancing diagnostics and treatment strategies. Future research should expand the application of I-CDM to diverse disease populations and explore its wide-ranging utility for medical conditions.
- Research Article
1
- 10.2196/60917
- Apr 2, 2025
- JMIR research protocols
The use of data standards is low across the health care system, and converting data to a common data model (CDM) is usually required to undertake international research. One such model is the Observational Medical Outcomes Partnership (OMOP) CDM. It has gained substantial traction across researchers and those who have developed data platforms. The Observational Health Care Data Sciences and Informatics (OHDSI) partnership manages OMOP and provides many open-source tools to assist in converting data to the OMOP CDM. The challenge, however, is in the skills, knowledge, know-how, and capacity within teams to convert their data to OMOP. The European Health Care Data Evidence Network provided funds to allow data owners to bring in external resources to do the required conversions. The Carrot software (University of Nottingham) is a new set of open-source tools designed to help address these challenges while not requiring data access by external resources. The use of data protection rules is increasing, and privacy by design is a core principle under the European and UK legislations related to data protection. Our aims for the Carrot software were to have a standardized mechanism for managing the data curation process, capturing the rules used to convert the data, and creating a platform that can reuse rules across projects to drive standardization of process and improve the speed without compromising on quality. Most importantly, we aimed to deliver this design-by-privacy approach without requiring data access to those creating the rules. The software was developed using Agile approaches by both software engineers and data engineers, who would ultimately use the system. Experts in OMOP were used to ensure the approaches were correct. An incremental release program was initiated to ensure we delivered continuous progress. Carrot has been delivered and used on a project called COVID-Curated and Open Analysis and Research Platform (CO-CONNECT) to assist in the process of allowing datasets to be discovered via a federated platform. It has been used to create over 45,000 rules, and over 5 million patient records have been converted. This has been achieved while maintaining our principle of not allowing access to the underlying data by the team creating the rules. It has also facilitated the reuse of existing rules, with most rules being reused rather than manually curated. Carrot has demonstrated how it can be used alongside existing OHDSI tools with a focus on the mapping stage. The COVID-Curated and Open Analysis and Research Platform project successfully managed to reuse rules across datasets. The approach is valid and brings the benefits expected, with future work continuing to optimize the generation of rules. RR1-10.2196/60917.
- Research Article
68
- 10.1186/s12874-021-01434-3
- Nov 2, 2021
- BMC Medical Research Methodology
BackgroundThe Observational Medical Outcomes Partnership (OMOP) Common Data Model (CDM) can be used to transform observational health data to a common format. CDM transformation allows for analysis across disparate databases for the generation of new, real-word evidence, which is especially important in rare disease where data are limited. Pulmonary hypertension (PH) is a progressive, life-threatening disease, with rare subgroups such as pulmonary arterial hypertension (PAH), for which generating real-world evidence is challenging. Our objective is to document the process and outcomes of transforming registry data in PH to the OMOP CDM, and highlight challenges and our potential solutions.MethodsThree observational studies were transformed from the Clinical Data Interchange Standards Consortium study data tabulation model (SDTM) to OMOP CDM format. OPUS was a prospective, multi-centre registry (2014–2020) and OrPHeUS was a retrospective, multi-centre chart review (2013–2017); both enrolled patients newly treated with macitentan in the US. EXPOSURE is a prospective, multi-centre cohort study (2017–ongoing) of patients newly treated with selexipag or any PAH-specific therapy in Europe and Canada. OMOP CDM version 5.3.1 with recent OMOP CDM vocabulary was used. Imputation rules were defined and applied for missing dates to avoid exclusion of data. Custom target concepts were introduced when existing concepts did not provide sufficient granularity.ResultsOf the 6622 patients in the three registry studies, records were mapped for 6457. Custom target concepts were introduced for PAH subgroups (by combining SNOMED concepts or creating custom concepts) and World Health Organization functional class. Per the OMOP CDM convention, records about the absence of an event, or the lack of information, were not mapped. Excluding these non-event records, 4% (OPUS), 2% (OrPHeUS) and 1% (EXPOSURE) of records were not mapped.ConclusionsSDTM data from three registries were transformed to the OMOP CDM with limited exclusion of data and deviation from the SDTM database content. Future researchers can apply our strategy and methods in different disease areas, with tailoring as necessary. Mapping registry data to the OMOP CDM facilitates more efficient collaborations between researchers and establishment of federated data networks, which is an unmet need in rare diseases.
- Research Article
5
- 10.2196/47959
- Nov 7, 2023
- JMIR Medical Informatics
National classifications and terminologies already routinely used for documentation within patient care settings enable the unambiguous representation of clinical information. However, the diversity of different vocabularies across health care institutions and countries is a barrier to achieving semantic interoperability and exchanging data across sites. The Observational Medical Outcomes Partnership (OMOP) Common Data Model (CDM) enables the standardization of structure and medical terminology. It allows the mapping of national vocabularies into so-called standard concepts, representing normative expressions for international analyses and research. Within our project "Hybrid Quality Indicators Using Machine Learning Methods" (Hybrid-QI), we aim to harmonize source codes used in German claims data vocabularies that are currently unavailable in the OMOP CDM. This study aims to increase the coverage of German vocabularies in the OMOP CDM. We aim to completely transform the source codes used in German claims data into the OMOP CDM without data loss and make German claims data usable for OMOP CDM-based research. To prepare the missing German vocabularies for the OMOP CDM, we defined a vocabulary preparation approach consisting of the identification of all codes of the corresponding vocabularies, their assembly into machine-readable tables, and the translation of German designations into English. Furthermore, we used 2 proposed approaches for OMOP-compliant vocabulary preparation: the mapping to standard concepts using the Observational Health Data Sciences and Informatics (OHDSI) tool Usagi and the preparation of new 2-billion concepts (ie, concept_id >2 billion). Finally, we evaluated the prepared vocabularies regarding completeness and correctness using synthetic German claims data and calculated the coverage of German claims data vocabularies in the OMOP CDM. Our vocabulary preparation approach was able to map 3 missing German vocabularies to standard concepts and prepare 8 vocabularies as new 2-billion concepts. The completeness evaluation showed that the prepared vocabularies cover 44.3% (3288/7417) of the source codes contained in German claims data. The correctness evaluation revealed that the specified validity periods in the OMOP CDM are compliant for the majority (705,531/706,032, 99.9%) of source codes and associated dates in German claims data. The calculation of the vocabulary coverage showed a noticeable decrease of missing vocabularies from 55% (11/20) to 10% (2/20) due to our preparation approach. By preparing 10 vocabularies, we showed that our approach is applicable to any type of vocabulary used in a source data set. The prepared vocabularies are currently limited to German vocabularies, which can only be used in national OMOP CDM research projects, because the mapping of new 2-billion concepts to standard concepts is missing. To participate in international OHDSI network studies with German claims data, future work is required to map the prepared 2-billion concepts to standard concepts.
- Research Article
- 10.3233/shti240730
- Aug 22, 2024
- Studies in health technology and informatics
This paper presents the development of a visualization dashboard for quality indicators in intensive care units (ICUs), using the Observational Medical Outcomes Partnership (OMOP) Common Data Model (CDM). The dashboard enables the user to visualize quality indicator data using histograms, pie charts and tables. Our project uses the OMOP CDM, ensuring a seamless implementation of our dashboard across various hospitals. Future directions for our research include expanding the dashboard to incorporate additional quality indicators and evaluating clinicians' feedback on its effectiveness.
- Research Article
- 10.1186/s12911-026-03375-7
- Feb 28, 2026
- BMC medical informatics and decision making
Although breast cancer pathology reports provide important clinical research data, for they are written as free text, previous studies extracted data from pathology reports through natural language processing (NLP). This study aimed to present the process of standardizing the data extracted from pathology reports to the Observational Medical Outcomes Partnership (OMOP) Common Data Model (CDM) and demonstrate its potential utility. Data were extracted from 11,374 immunohistochemistry, 5,904 molecular pathology, and 13,857 surgical pathology reports from 10,730 patients with breast cancer at a tertiary general hospital between May 2003 and December 2022. Using rule-based NLP, which includes text pre-processing, report segmentation, and pattern-based extraction, along with the mapping of the extracted data to OMOP standard vocabularies, we integrated breast cancer pathology data into the pre-existing OMOP-CDM. NLP-derived data from the pathology reports were populated with the NOTE_NLP, CONDITION_OCCURRENCE, MEASUREMENT, SPECIMEN, FACT_RELATIONSHIP, EPISODE, and EPISODE_EVENT tables. In a feasibility study, patient-level prediction analyses were conducted to demonstrate the utility of the newly added breast cancer pathology data and open-source tools for OMOP-CDM, yielding an Area Under the Receiver Operating Characteristic curve (AUROC) of 0.840 for predicting all-cause mortality within 5 years after the initial pathologic diagnosis. By integrating breast cancer pathology information into the OMOP-CDM, this study facilitates the unified analysis of clinical and pathological data for retrospective oncology research. Moreover, the proposed NLP-based pathology report standardization framework for integrating into the OMOP-CDM can be readily adopted by other institutions as the use of OMOP-CDM continues to expand in cancer research.
- Research Article
1
- 10.2196/49542
- Aug 13, 2024
- JMIR medical informatics
Patient-monitoring software generates a large amount of data that can be reused for clinical audits and scientific research. The Observational Health Data Sciences and Informatics (OHDSI) consortium developed the Observational Medical Outcomes Partnership (OMOP) Common Data Model (CDM) to standardize electronic health record data and promote large-scale observational and longitudinal research. This study aimed to transform primary care data into the OMOP CDM format. We extracted primary care data from electronic health records at a multidisciplinary health center in Wattrelos, France. We performed structural mapping between the design of our local primary care database and the OMOP CDM tables and fields. Local French vocabularies concepts were mapped to OHDSI standard vocabularies. To validate the implementation of primary care data into the OMOP CDM format, we applied a set of queries. A practical application was achieved through the development of a dashboard. Data from 18,395 patients were implemented into the OMOP CDM, corresponding to 592,226 consultations over a period of 20 years. A total of 18 OMOP CDM tables were implemented. A total of 17 local vocabularies were identified as being related to primary care and corresponded to patient characteristics (sex, location, year of birth, and race), units of measurement, biometric measures, laboratory test results, medical histories, and drug prescriptions. During semantic mapping, 10,221 primary care concepts were mapped to standard OHDSI concepts. Five queries were used to validate the OMOP CDM by comparing the results obtained after the completion of the transformations with the results obtained in the source software. Lastly, a prototype dashboard was developed to visualize the activity of the health center, the laboratory test results, and the drug prescription data. Primary care data from a French health care facility have been implemented into the OMOP CDM format. Data concerning demographics, units, measurements, and primary care consultation steps were already available in OHDSI vocabularies. Laboratory test results and drug prescription data were mapped to available vocabularies and structured in the final model. A dashboard application provided health care professionals with feedback on their practice.
- Research Article
14
- 10.1002/cpt.1785
- Mar 2, 2020
- Clinical Pharmacology and Therapeutics
Exploring and combining results from more than one real‐world data (RWD) source might be necessary in order to explore variability and demonstrate generalizability of the results or for regulatory requirements. However, the heterogeneous nature of RWD poses challenges when working with more than one source, some of which can be solved by analyzing databases converted into a common data model (CDM). The main objective of the study was to evaluate the implementation of the Observational Medical Outcome Partnership (OMOP) CDM on IQVIA Medical Research Data (IMRD)‐UK data. A drug utilization study describing the prescribing of codeine for pain in children was used as a case study to be replicated in IMRD‐UK and its corresponding OMOP CDM transformation. Differences between IMRD‐UK source and OMOP CDM were identified and investigated. In IMRD‐UK updated to May 2017, results were similar between source and transformed data with few discrepancies. These were the result of different conventions applied during the transformation regarding the date of birth for children younger than 15 years and the start of the observation period, and of a misclassification of two drug treatments. After the initial analysis and feedback provided, a rerun of the analysis in IMRD‐UK updated to September 2018 showed almost identical results for all the measures analyzed. For this study, the conversion to OMOP CDM was adequate. Although some decisions and mapping could be improved, these impacted on the absolute results but not on the study inferences. This validation study supports six recommendations for good practice in transforming to CDMs.
- Research Article
66
- 10.1016/j.jbi.2019.103253
- Jul 17, 2019
- Journal of Biomedical Informatics
Facilitating phenotype transfer using a common data model
- Conference Article
3
- 10.1109/bibm.2018.8621269
- Dec 1, 2018
Rapid increase in the implementation of electronic health records (EHRs) has led to an unprecedented expansion in the availability of dense longitudinal cohort datasets for clinical studies. However, there is a growing need to ensure data traceability, validity, and reproducibility for EHR-based clinical research. Applying common data models that standardize EHR data elements could reduce research discrepancies and improve research reproducibility. As a pilot study, we utilized the Observational Medical Outcomes Partnership (OMOP) Common Data Model (CDM) developed by the Observational Health Data Sciences and Informatics (OHDSI) community to annotate cohort data elements from the local Rochester Epidemiology Project (REP). We evaluated the data coverage of the OMOP CDM by manually annotating the cohorts from 92 REP publications. Next, we examined cohort similarities between different publications using OMOP elements. Evaluation results showed that the OMOP CDM covers 99.8% of the content that is associated with cohort attributes. It demonstrated that the OMOP CDM can be used for data element standardization when extracting information from EHR and clinical registries. The OMOP CDM also shows its potential to be used as a tool for retrospective examination of cohort definition consistencies and epidemiology model similarities.
- Ask R Discovery
- Chat PDF
AI summaries and top papers from 250M+ research sources.