Abstract

The heterogeneity of the formats and standards of clinical data, which includes both structured, semi-structured, and unstructured data, in addition to the sensitive information contained in them, require the definition of specific approaches that are able to implement methodologies that can permit the extraction of valuable information buried under such data. Although many challenges and issues that have not been fully addressed still exist when this information must be processed and used for further purposes, the most recent techniques based on machine learning and big data analytics can support the information extraction process for the secondary use of clinical data. In particular, these techniques can facilitate the transformation of heterogeneous data into a common standard format. Moreover, they can also be exploited to define anonymization or pseudonymization approaches, respecting the privacy requirements stated in the General Data Protection Regulation, Health Insurance Portability and Accountability Act and other national and regional laws. In fact, compliance with these laws requires that only de-identified clinical and personal data can be processed for secondary analyses, in particular when data is shared or exchanged across different institutions. This work proposes a modular architecture capable of collecting clinical data from heterogeneous sources and transforming them into useful data for secondary uses, such as research, governance, and medical education purposes. The proposed architecture is able to exploit appropriate modules and algorithms, carry out transformations (pseudonymization and standardization) required to use data for the second purposes, as well as provide efficient tools to facilitate the retrieval and analysis processes. Preliminary experimental tests show good accuracy in terms of quantitative evaluations.

Highlights

  • The large availability of health data and documents in digital format managed in Health Information Systems (HISs) facilitate the advancement of clinical research, the improvement of care services through academic training, and supports the administrative and health processes through control and governance [1]

  • The definition of an architecture able to provide the integration of heterogeneous health and clinical data according to the Fast Healthcare Interoperability Resources (FHIR) interoperable standard, which includes the capability of automatically selecting and applying the most suitable pseudonymization algorithms, enabling the secondary use of clinical data; The proposal of a solution to perform the analysis of the obtained large collections of

  • The adequate and efficient use of data from heterogeneous sources was obtained by making the representation of the processed information compliant and standard, leveraging the HL7 FHIR format

Read more

Summary

Introduction

The large availability of health data and documents in digital format managed in Health Information Systems (HISs) facilitate the advancement of clinical research, the improvement of care services through academic training, and supports the administrative and health processes through control and governance (i.e., clinical audits for quality enhancement and clinical governance) [1]. The definition of an architecture able to provide the integration of heterogeneous health and clinical data according to the FHIR interoperable standard, which includes the capability of automatically selecting and applying the most suitable pseudonymization algorithms, enabling the secondary use of clinical data; The proposal of a solution to perform the analysis of the obtained large collections of FHIR resources. After this introduction, the paper is organized as follows.

International Regulation and Law
ETL Process
Data De-Identification
HL7 CDA and HL7 FHIR Standards
Related Works
System Architecture
Extraction Module
Loader Module
Data Retrieval Module
Implementation Details
Discussion
Example
Data minimization
Conclusions
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call