Deep Learning Approach to Generate a Synthetic Cognitive Psychology Behavioral Dataset

Jung-Gu Choi,Inhwan Ko,Sanghoon Han,Yoonjin Nah

doi:10.1109/access.2021.3120083

Jung-Gu Choi, Inhwan Ko + Show 2 more

Open Access

https://doi.org/10.1109/access.2021.3120083

Copy DOI

Journal: IEEE Access	Publication Date: Jan 1, 2021
Citations: 2	License type: CC BY 4.0

Affiliation: Yonsei University

Abstract

Synthetic data generation is critical in machine and deep learning research to overcome the shortage of samples or dataset sizes. Various algorithms, including the generative adversarial network and autoencoder models, have been applied to generate artificial datasets in previous studies. In this study, we propose a synthetic data generation framework for a tabular dataset collected from cognitive psychology behavioral experiments based on deep learning algorithms. Tabular datasets for the Stroop task were used to develop our framework. On account of the relatively small sample size (N=102) of the dataset used in our study, we used a pre-trained generative adversarial network model to complement the size of the dataset. Furthermore, we proposed and applied five evaluation methods with statistical tests (overlapped sample test, constraint reflection test, correlation reflection test, distribution distance test, and feature distance test) to validate generation performance based on internal levels of table structure (instance level, feature level, and whole-set level evaluations). The proposed framework with a fine-tuned generative adversarial network algorithm was compared with a random generation method to verify generation performance, including the representation of the statistical characteristics of the original datasets. We found that the generated datasets from the proposed framework exhibited more similar statistical characteristics with the original dataset than the randomly generated datasets based on five evaluation methods. The results of this study provide not only generation algorithms for cognitive psychological datasets with tabular type but also a solution to the sample size issue for researchers.

Highlights

Sample or dataset size is considered a critical factor for various data analysis methodologies, including statistical and machine learning methods [1,2,3,4]
We propose a synthetic data generation framework for a tabular dataset collected from cognitive psychology behavioral experiments based on deep learning algorithms
We found that the generated datasets from the proposed framework exhibited more similar statistical characteristics with the original dataset than the randomly generated datasets based on five evaluation methods

Summary

Introduction

Sample or dataset size is considered a critical factor for various data analysis methodologies, including statistical and machine learning methods [1,2,3,4]. In terms of statistical analysis, many statistical tests require an appropriate sample size to verify the power or reliability of the results [5, 6]. Lachin et al suggested the importance of sample size determination and power analysis in clinical trials [7]. Maccallum et al introduced a framework to determine the minimum sample size for power in empirical behavioral research [8]. An adequate dataset size is essential for machine and deep learning methodologies. Sun et al suggested a relationship between dataset size and model performance in visual deep learning models [13]

Objectives

Results

Discussion

Conclusion