Abstract

An emerging area in data science that has lately gained attention is the virtual population (VP) and synthetic data generation. This field has the potential to significantly affect the healthcare industry by providing a means to augment clinical research databases that have a shortage of subjects. The current study provides a comparative analysis of five distinct approaches for creating virtual data populations from real patient data. The data set utilized for the current analyses involved clinical data collected among patients scheduled for elective coronary artery bypass graft surgery (CABG). To that end, the five computational techniques employed to augment the given dataset were: (i) Tabular Preset, (ii) Gaussian Copula Model (iii) Generative Adversarial Network based (GAN) Deep Learning data synthesizer (CTGAN), (iv) a variation of the CTGAN Model (Copula GAN), and (v) VAE-based Deep Learning data synthesizer (TVAE). The performance of these techniques was assessed against their effectiveness in producing high-quality virtual data. For this purpose, dataset correlation matrices, cosine similarity distance, density histograms, and kernel density estimation are employed to perform a comparative analysis of each attribute and the respective synthetic equivalent. Our findings demonstrate that Gaussian Copula Model prevails in creating virtual data with consistent distributions (Kolmogorov-Smirnov (KS) and Chi-Squared (CS) tests equal to 0.9 and 0.98, respectively) and correlation patterns (average cosine similarity equals to 0.95).Clinical Relevance- It has been shown that the use of a VP can increase the predictive performance of a ML model, i.e., above using a smaller non-augmented population.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.