Techniques to produce and evaluate realistic multivariate synthetic data

John Heine,Erin E E Fowler,Anders Berglund,Michael J Schell,Steven Eschrich

doi:10.1038/s41598-023-38832-0

Abstract

Data modeling requires a sufficient sample size for reproducibility. A small sample size can inhibit model evaluation. A synthetic data generation technique addressing this small sample size problem is evaluated: from the space of arbitrarily distributed samples, a subgroup (class) has a latent multivariate normal characteristic; synthetic data can be generated from this class with univariate kernel density estimation (KDE); and synthetic samples are statistically like their respective samples. Three samples (n = 667) were investigated with 10 input variables (X). KDE was used to augment the sample size in X. Maps produced univariate normal variables in Y. Principal component analysis in Y produced uncorrelated variables in T, where the probability density functions were approximated as normal and characterized; synthetic data was generated with normally distributed univariate random variables in T. Reversing each step produced synthetic data in Y and X. All samples were approximately multivariate normal in Y, permitting the generation of synthetic data. Probability density function and covariance comparisons showed similarity between samples and synthetic samples. A class of samples has a latent normal characteristic. For such samples, this approach offers a solution to the small sample size problem. Further studies are required to understand this latent class.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Scientific reports	Publication Date: Jul 28, 2023
Citations: 1	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

Techniques to produce and evaluate realistic multivariate synthetic data

Abstract

Talk to us

Similar Papers

More From: Scientific reports

Lead the way for us

Similar Papers

Generalized nonlinear discriminant analysis and its small sample size problems
Li Zhang ... Pei-Chann Chang
Neurocomputing | VOL. 74
Li Zhang, et. al.Li Zhang ... Pei-Chann Chang
27 Oct 2010
Neurocomputing | VOL. 74

Empirically-derived synthetic populations to mitigate small sample sizes.
Erin E Fowler ... Michael J Schell
Journal of Biomedical Informatics | VOL. 105
Erin E Fowler, et. al.Erin E Fowler ... Michael J Schell
12 Mar 2020
Journal of Biomedical Informatics | VOL. 105

Subspace Regularized Linear Discriminant Analysis for Small Sample Size Problems
Zhidong Wang ... Wuyi Yang
-
Zhidong Wang, et. al.Zhidong Wang ... Wuyi Yang
01 Jan 2012
01 Jan 2012

Discriminant common vectors versus neighbourhood components analysis and Laplacianfaces: A comparative study in small sample size problem
Jun Liu ... Songcan Chen
Image and Vision Computing | VOL. 24
Jun Liu, et. al.Jun Liu ... Songcan Chen
04 Jan 2006
Image and Vision Computing | VOL. 24

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Techniques to produce and evaluate realistic multivariate synthetic data

Abstract

Talk to us

Similar Papers

More From: Scientific reports