Generation Of Synthetic Samples Research Articles

To generate a synthetic sample of 1 million individuals that reflect the characteristics of the population recorded in the Health Survey for England (HSE). We used data from the HSE to determine the age and gender-dependent distributions of continuous variable risk factors (height, weight, BMI, systolic blood pressure, total and HDL cholesterol and their ratio, number of cigarettes/day and units of alcohol/week) and prevalence of binary risk factors (smoking status, diabetes). Spearman rank correlations including age and gender were determined for these risk factors. A table of normally distributed random numbers was generated. Cholesky decomposition was used to replicate the observed Spearman rank correlations in the table of random numbers. Rank correlations that included binary variables were recalibrated to adjust for numerous tied values. The sample was then generated using a reverse look-up of the gamma distribution value using the random percentiles for continuous variables or setting a binary variable to 1 when the random percentile falls below the prevalence threshold. Differences between coefficients were no more than 0.5% for any continuous variable. The prevalence of binary factors in the SS was very well matched with the HSE sample. Smoker incidence rates were 18.8% and 16.7% in the SS versus 18.4% and 16.5% in the HSE sample, for males and females respectively. Prevalence of diabetes in the SS was 13.3% and 7.7% versus 13.2% and 7.8%, and for cardiovascular disease was 17.6% and 14.1% versus 18.2% and 14.6%. Comparing 25th, 50th and 75th quantiles, the maximum difference between the original and synthetic values for BMI and TC/HDL ratio were 0.6Kg and 0.3 respectively. Our new approach generates large synthetic samples with risk factor distributions very closely matching those of the real HSE population. This sample can be used to model the likely impact of new therapies or predict mortality.

Read full abstract

This article describes a methodology to generate a large database of synthetic samples from a small set of original online handwriting specimens. The overall paradigm is based on the Kinematic Theory of rapid human movements and its sigma-lognormal model. The principal contributions of the present study include (i) development of a strategy for sigma-lognormal model-based generation of synthetic samples from real online handwriting samples of arbitrary scripts captured by arbitrary relevant devices and (ii) verification of the structural similarities, including the naturalness of such synthetic prototypes, through various human perception experiments, computer evaluations and statistical hypothesis testing. A database consisting of a large number of online synthetic handwritten word samples is used to train and evaluate the performance of three existing automatic online handwriting recognition systems. Training based on a combined set of original and synthetic samples improves the recognition accuracies on the test set. A combined training set is useful irrespective of the nature of the feature set used (online, offline or combined). Although the proposed method has primarily been developed and applied to the design of an online handwriting sample database of a popular Indian script, Bangla, it can be applied to the generation of large databases of any arbitrary script for example: English, Chinese and Arabic.

Read full abstract

Generation Of Synthetic Samples Research Articles

Related Topics

Articles published on Generation Of Synthetic Samples

PRM66 - Synthetic Sample Generation Representing the English Population Using Spearman Rank Correlation and Chomsky Decomposition

REMEDIAL-HwR: Tackling multilabel imbalance through label decoupling and data resampling hybridization

A sigma-lognormal model-based approach to generating large synthetic online handwriting sample databases

Using Genetic Approach for Learning from Imbalanced Text Corpora

Estimation of seismic building structural types using multi-sensor remote sensing and machine learning techniques

Evaluation of technique to overcome small dataset problems during neural-network based contamination classification of packaged beef using integrated olfactory sensor system

Neural-Network-Based Classification of Meat: Evaluation of Techniques to Overcome Small Dataset Problems

Initialization for generating single‐site and multisite low‐order periodic autoregressive and moving average processes

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Generation Of Synthetic Samples Research Articles

Related Topics

Articles published on Generation Of Synthetic Samples

PRM66 - Synthetic Sample Generation Representing the English Population Using Spearman Rank Correlation and Chomsky Decomposition

REMEDIAL-HwR: Tackling multilabel imbalance through label decoupling and data resampling hybridization

A sigma-lognormal model-based approach to generating large synthetic online handwriting sample databases

Using Genetic Approach for Learning from Imbalanced Text Corpora

Estimation of seismic building structural types using multi-sensor remote sensing and machine learning techniques

Evaluation of technique to overcome small dataset problems during neural-network based contamination classification of packaged beef using integrated olfactory sensor system

Neural-Network-Based Classification of Meat: Evaluation of Techniques to Overcome Small Dataset Problems

Initialization for generating single‐site and multisite low‐order periodic autoregressive and moving average processes