Abstract

Creation of realistic synthetic behavior-based sensor data is an important aspect of testing machine learning techniques for healthcare applications. Many of the existing approaches for generating synthetic data are often limited in terms of complexity and realism. We introduce SynSys, a machine learning-based synthetic data generation method, to improve upon these limitations. We use this method to generate synthetic time series data that is composed of nested sequences using hidden Markov models and regression models which are initially trained on real datasets. We test our synthetic data generation technique on a real annotated smart home dataset. We use time series distance measures as a baseline to determine how realistic the generated data is compared to real data and demonstrate that SynSys produces more realistic data in terms of distance compared to random data generation, data from another home, and data from another time period. Finally, we apply our synthetic data generation technique to the problem of generating data when only a small amount of ground truth data is available. Using semi-supervised learning we demonstrate that SynSys is able to improve activity recognition accuracy compared to using the small amount of real data alone.

Highlights

  • When creating models from sensor data, machine learning algorithms need to be trained and validated using diverse datasets, including some with known patterns and distributions

  • We base the fundamentals of our work on earlier efforts that use machine learning and modeling-based methods to improve the realism of synthetic human behavior data

  • This is intended to demonstrate how SynSys would compare to an alternative synthetic data generation method that does not use combinations of hidden Markov models (HMMs)’s, Ridge Regression, and a reset period to enforce day structure

Read more

Summary

Introduction

When creating models from sensor data, machine learning algorithms need to be trained and validated using diverse datasets, including some with known patterns and distributions. Many types of real-world sensor-driven datasets are limited in terms of availability and variety. This can introduce difficulties when employing machine learning techniques that rely on large labeled training datasets. In order to address this problem, synthetic data can be created for initial testing and validation of novel machine learning techniques. We introduce a new method for generating synthetic sensor data that is reflective of human behavior found in real sensor datasets. We base the fundamentals of our work on earlier efforts that use machine learning and modeling-based methods to improve the realism of synthetic human behavior data

Objectives
Methods
Results
Discussion
Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.