Abstract

Aim: Data harmonization standardizes healthcare information, enhancing accessibility and interoperability, which is crucial for improving patient outcomes and driving medical research and innovation. It enables precise diagnoses and personalized treatments, and boosts AI model efficiency. However, significant challenges such as ethical concerns, technical barriers in the data lifecycle, AI biases, and varied regional regulations impede progress, underscoring the need for solutions like adopting universal standards such as HL7 FHIR, where the lack of generalized harmonization efforts is significant. Methods: We propose an advanced, holistic framework that utilizes FAIR-compliant reference ontologies (based on the FAIRplus and FAIR CookBook criteria) to make data findable, accessible, interoperable, and reusable enriched with terminologies from OHDSI (Observational Health Data Sciences and Informatics) vocabularies and word embeddings to identify lexical and conceptual overlaps across heterogeneous data models. Results: The proposed approach was applied to autoimmune diseases, cardiovascular diseases, and mental disorders using unstructured data from EU cohorts involving 7,551 patients with primary Sjogren’s Syndrome, 25,000 patients with cardiovascular diseases, and 3,500 patients with depression and anxiety. Metadata from these datasets were structured into dictionaries and linked with three newly developed reference ontologies (ROPSS, ROCVD, and ROMD), which are accessible on GitHub. These ontologies facilitated data interoperability across different systems and helped identify common terminologies with high precision within each domain. Conclusion: Through the proposed framework, we aim to urge the adoption of data harmonization as a priority, emphasizing the need for global cooperation, investment in technology and infrastructure, and adherence to ethical data usage practices toward a more efficient and patient-centered global healthcare system.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call