SynSys: A Synthetic Data Generation System for Healthcare Applications.

Jessamyn Dahmen,Diane Cook

doi:10.3390/s19051181

Jessamyn Dahmen, Diane Cook

Open Access

PDF Available

https://doi.org/10.3390/s19051181

Copy DOI

Export

Save

Cite

Journal: Sensors	Publication Date: Mar 8, 2019
Citations: 112	License type: CC BY 4.0

Affiliation: Washington State University

Abstract
Highlights/Summary
Full-Text PDF
Similar Papers

Abstract

Listen

Creation of realistic synthetic behavior-based sensor data is an important aspect of testing machine learning techniques for healthcare applications. Many of the existing approaches for generating synthetic data are often limited in terms of complexity and realism. We introduce SynSys, a machine learning-based synthetic data generation method, to improve upon these limitations. We use this method to generate synthetic time series data that is composed of nested sequences using hidden Markov models and regression models which are initially trained on real datasets. We test our synthetic data generation technique on a real annotated smart home dataset. We use time series distance measures as a baseline to determine how realistic the generated data is compared to real data and demonstrate that SynSys produces more realistic data in terms of distance compared to random data generation, data from another home, and data from another time period. Finally, we apply our synthetic data generation technique to the problem of generating data when only a small amount of ground truth data is available. Using semi-supervised learning we demonstrate that SynSys is able to improve activity recognition accuracy compared to using the small amount of real data alone.

Highlights

When creating models from sensor data, machine learning algorithms need to be trained and validated using diverse datasets, including some with known patterns and distributions
We base the fundamentals of our work on earlier efforts that use machine learning and modeling-based methods to improve the realism of synthetic human behavior data
This is intended to demonstrate how SynSys would compare to an alternative synthetic data generation method that does not use combinations of hidden Markov models (HMMs)’s, Ridge Regression, and a reset period to enforce day structure

Summary

Introduction

When creating models from sensor data, machine learning algorithms need to be trained and validated using diverse datasets, including some with known patterns and distributions. Many types of real-world sensor-driven datasets are limited in terms of availability and variety. This can introduce difficulties when employing machine learning techniques that rely on large labeled training datasets. In order to address this problem, synthetic data can be created for initial testing and validation of novel machine learning techniques. We introduce a new method for generating synthetic sensor data that is reflective of human behavior found in real sensor datasets. We base the fundamentals of our work on earlier efforts that use machine learning and modeling-based methods to improve the realism of synthetic human behavior data

Objectives

Methods

Results

Discussion

Conclusion

Full Text

Published Version (Free)

View/Download pdf

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

SynSys: A Synthetic Data Generation System for Healthcare Applications.

Abstract

Highlights

Summary

Published Version (Free)

Talk to us

Similar Papers

More From: Sensors

Lead the way for us

Similar Papers

Can synthetic data be a proxy for real clinical trial data? A validation study
Zahra Azizi ... Khaled El Emam
BMJ Open | VOL. 11
Zahra Azizi, et. al.Zahra Azizi ... Khaled El Emam
01 Apr 2021
BMJ Open | VOL. 11

Synthetic Data Generation By Artificial Intelligence to Accelerate Translational Research and Precision Medicine in Hematological Malignancies
Saverio D'Amico ...
Blood | VOL. 140
Saverio D'Amico, et. al.Saverio D'Amico ...
15 Nov 2022
Blood | VOL. 140

A method for generating synthetic longitudinal health data
Lucy Mosquera ... Bei Jiang
BMC Medical Research Methodology | VOL. 23
Lucy Mosquera, et. al.Lucy Mosquera ... Bei Jiang
23 Mar 2023
BMC Medical Research Methodology | VOL. 23

Machine learning models trained on synthetic datasets of multiple sample sizes for the use of predicting blood pressure from clinical data in a national dataset.
Anmol Arora ... Ananya Arora
PloS one | VOL. 18
Anmol Arora, et. al.Anmol Arora ... Ananya Arora
16 Mar 2023
PloS one | VOL. 18

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

SynSys: A Synthetic Data Generation System for Healthcare Applications.

Abstract

Highlights

Summary

Published Version (Free)

Talk to us

Similar Papers

More From: Sensors