Abstract

To generate a synthetic sample of individuals with mean characteristics that reflect those of a population in clinical research. In R we developed a stochastic resampling technique to generate a semi-random sample of people with characteristics that match those of the control group in the Scandinavian Simvastatin Survival Study (4S). The sample was matched on binary variables (gender ratio, age, smoking status, diabetes) and on continuous factors (BMI, systolic blood pressure, total cholesterol: HDL cholesterol ratio, number of cigarettes smoked per day and units of alcohol consumed per week). The descriptive statistics generated for the synthetic sample matched the target sample to an accuracy of 2 decimal points. The algorithm successfully generated a sample of 2,222 individuals with characteristics closely matching those of the 4S study control group. The only notable difference in the data summary was that the range of TC in the 4S study control group was 5.01 to 25 whereas in the synthetic sample the generated range was 5.01 to 12. The samples were well matched for all continuous variables. The average values reported in the 4S study for BMI, systolic BP, TC, and HDL were 26.0 (SD = 3.3), 139.1 (SD = 19.6), 6.7 (SD = 0.7), and 1.4 (SD = 0.3), respectively. From the synthetic sample the average values were 26.0 (SD = 4.2, 95% CI [17.84, 34.18]), 139.1 (SD = 20.1, 95% CI [99.62, 178.52]), 6.7 (SD = 1.1, 95% CI [4.54, 8.94]), and 1.4 (SD = 0.3, 95% CI [0.75, 2.06]), respectively. This new method was successful in generating synthetic samples that are comparable to the originals in aggregate. These synthetic samples can be used to model the likely impact of new therapies or predict mortality for various sub-groups and will be a useful tool in the planning and preparation of clinical trials.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call