Abstract

To date, the use of synthetic data generation techniques in the health and wellbeing domain has been mainly limited to research activities. Although several open source and commercial packages have been released, they have been oriented to generating synthetic data as a standalone data preparation process and not integrated into a broader analysis or experiment testing workflow. In this context, the VITALISE project is working to harmonize Living Lab research and data capture protocols and to provide controlled processing access to captured data to industrial and scientific communities. In this paper, we present the initial design and implementation of our synthetic data generation approach in the context of VITALISE Living Lab controlled data processing workflow, together with identified challenges and future developments. By uploading data captured from Living Labs, generating synthetic data from them, developing analysis locally with synthetic data, and then executing them remotely with real data, the utility of the proposed workflow has been validated. Results have shown that the presented workflow helps accelerate research on artificial intelligence, ensuring compliance with data protection laws. The presented approach has demonstrated how the adoption of state-of-the-art synthetic data generation techniques can be applied for real-world applications.

Highlights

  • The results obtained when applying the VITALISE Living Labs (LLs) controlled data processing workflow to a real-world usage example are presented to evaluate the incorporation of synthetic data generation (SDG) techniques in the presented workflow

  • The results obtained from the workflow execution are discussed, and the main findings, limitations, and future work of the developed research are presented

  • Local analysis has been performed with each obtained Synthetic data (SD) asset, and the remote execution of the same analyses with real data (RD) have been requested

Read more

Summary

Introduction

Synthetic data (SD) is data generated artificially by a mathematical model to replicate distributions and structures of some real data (RD) [1]. In this context, synthetic data generation (SDG) has been widely researched within health and wellbeing domains for different data types, including biomedical signals [2–4], medical images [5–8], time-series smart-home activity data [9–12], and EHR tabular data [13–21]. Synthetic data generation (SDG) has been widely researched within health and wellbeing domains for different data types, including biomedical signals [2–4], medical images [5–8], time-series smart-home activity data [9–12], and EHR tabular data [13–21] Some of these studies used SDG to preserve privacy, ensuring a secure data exchange [3,4,10,12–19,21], while others used it to augment RD for training different ML models, either seeking to balance classes or to achieve more data to improve

Objectives
Methods
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.