Querying clinical data in HL7 RIM based relational model with morph-RDB

Freddy Priyatna,Sergio Paraiso-Medina,Oscar Corcho,Raul Alonso-Calvo

doi:10.1186/s13326-017-0155-8

Freddy Priyatna, Sergio Paraiso-Medina + Show 2 more

Open Access

https://doi.org/10.1186/s13326-017-0155-8

Copy DOI

Abstract

BackgroundSemantic interoperability is essential when carrying out post-genomic clinical trials where several institutions collaborate, since researchers and developers need to have an integrated view and access to heterogeneous data sources. One possible approach to accommodate this need is to use RDB2RDF systems that provide RDF datasets as the unified view. These RDF datasets may be materialized and stored in a triple store, or transformed into RDF in real time, as virtual RDF data sources. Our previous efforts involved materialized RDF datasets, hence losing data freshness.ResultsIn this paper we present a solution that uses an ontology based on the HL7 v3 Reference Information Model and a set of R2RML mappings that relate this ontology to an underlying relational database implementation, and where morph-RDB is used to expose a virtual, non-materialized SPARQL endpoint over the data.ConclusionsBy applying a set of optimization techniques on the SPARQL-to-SQL query translation algorithm, we can now issue SPARQL queries to the underlying relational data with generally acceptable performance.

Highlights

In the last years, clinical trials have started introducing genomic variables [1]
As we use a consistent naming convention when implementing the HL7 Reference Information Model (RIM) in both the ontology and the database schema, we can create an initial version of our R2RML mappings using a direct mapping [26] fashion, which is useful for bootstraping the mapping generation task
We were interested in comparing morph-RDB with another well-established Relational database to RDF (RDB2RDF) engine, such as D2R, considering the total time required for the execution of the SPARQL queries

Summary

Introduction

Clinical trials have started introducing genomic variables [1]. This requires performing patient stratification when selecting the patient population to apply the clinical trials to. Commonly produced by different institutions and rather heterogeneous in general, need to be used for patient stratification [2] Interoperability among those datasets is made easier by the use of biomedical standards and terminologies [3]. HL7 RIM Recent years have witnessed a huge increase of biomedical databases [19] This increased availability opens up new opportunities, while setting some new important challenges, especially with respects to their integration, which is crucial to obtain a proportional increment of knowledge in the biomedical area. Among the many Detailed Clinical Models that have been reviewed for the integration of biomedical datasets [20], the HL7 v3 is one of the most relevant, since main requirement for the CDM is that any data coming from clinical institutions can be represented without loss of information. We previously defined a relational model for it, which can be seen in Fig. 1 and described in [13]

Methods

Results

Conclusion