Abstract

This paper presents a system for declaratively transforming medical subjects' data into a common data model representation. Our work is part of the “GAAIN” project on Alzheimer's disease data federation across multiple data providers. We present a general purpose data transformation system that we have developed by leveraging the existing state-of-the-art in data integration and query rewriting. In this work we have further extended the current technology with new formalisms that facilitate expressing a broader range of data transformation tasks, plus new execution methodologies to ensure efficient data transformation for disease datasets.

Highlights

  • We present a data transformation system for automatically transforming biomedical data from a data source into a common data model

  • The work in (Detwiler et al, 2009) focuses on an XQuery driven mediation based approach to Alzheimer’s data integration but which is restricted to small groups of collaborators and groups that are willing to share their data externally within the group—a significant distinction from the Global Alzheimer’s Association Interactive Network (GAAIN) model

  • The DISCO (Marenco et al, 2014) framework is focused on data integration in support of the “NIF” (Marenco et al, 2014) portal and the integration is at the level of data aggregation as opposed to actual data fusion that GAAIN aims for

Read more

Summary

Introduction

We present a data transformation system for automatically transforming biomedical data from a data source into a common data model. One of the requirements in this approach is that each dataset from each data partner has to be transformed to this common data model. This data transformation is the most time consuming and effort intensive phase in integrating any new dataset. Alleviating this effort forms the motivation for our work. The declarative approach offers the advantage of being able to transform new datasets faster as developers have to primarily provide correct data transformation specifications for the new dataset, as opposed to writing custom code for every new data transformation

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call