Abstract

Background and objectivesThe formats, semantics and operational rules of data processing tasks in genomics (and health in general) are highly divergent and can rapidly change. In such an environment, the problem of consistent transformation and loading of heterogeneous input data to various target repositories becomes a critical success factor. The objective of the project was to design a new conceptual approach to configurable data transformation, de-identification, and submission of health and genomic data sets. Main motivation was to facilitate automated or human-driven data uploading, as well as consolidation of heterogeneous sources in large genomic or health projects. MethodsModern methods of on-demand specialization of generic software components were applied. For specification of input–output data and required data collection activities, we propose a simple data model of flat tables as well as a domain-oriented graphical interface and portable representation of transformations in XML. Using such methods, the prototype of the Configurable Data Collection System (CDCS) was implemented in Java programming language with Swing graphical interfaces. The core logic of transformations was implemented as a library of reusable plugins. ResultsThe solution is implemented as a software prototype for a configurable service-oriented system for semi-automatic data collection, transformation, sanitization and safe uploading to heterogeneous data repositories—CDCS. To address the dynamic nature of data schemas and data collection processes, the CDCS prototype facilitates interactive, user-driven configuration of the data collection process and extends basic functionality with a wide range of third-party plugins. Notably, our solution also allows for the reduction of manual data entry for data originally missing in the output data sets. ConclusionsFirst experiments and feedback from domain experts confirm the prototype is flexible, configurable and extensible; runs well on data owner's systems; and is not dependent on vendor's standards.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.