Human activity recognition (HAR) is an active research field that has seen great success in recent years due to advances in sensory data collection methods and activity recognition systems. Deep artificial intelligence (AI) models have contributed to the success of HAR systems lately although still suffering from limitations such as data scarcity, the high costs of labelling data instances and datasets’ imbalance and bias. The temporal nature of human activity data, represented as time series data, impose an additional challenge to using AI models in HAR because most state-of-the-art models do not account for the time component of the data instances. These limitations have inspired the time-series research community to design generative models for sequential data but very little work has been done to evaluate the quality of such models. In this work, we conduct a comparative quality analysis of three generative models for time-series data, using a case study in which we aim to generate sensory human activity data from a seed public dataset. Additionally, we adapt and clearly explain four evaluation methods of synthetic time-series data from the literature and apply them to assess the quality of the synthetic activity data we generate. We show experimentally that high quality human activity data can be generated using deep generative models, and the synthetic data can thus be used in HAR systems to augment real activity data. We also demonstrate that the chosen evaluation methods effectively ensure that the generated data meets the essential quality benchmarks of realism, diversity, coherence and utility. Our findings suggest that using deep generative models to produce synthetic human activity data can potentially address challenges related to data scarcity, biases, and expensive labeling. This holds promise for enhancing the efficiency and reliability of HAR systems.
Read full abstract