Abstract

Large-scale high-throughput transcriptome sequencing data holds significant value in biomedical research. However, practical challenges such as difficulty in sample acquisition often limit the availability of large sample sizes, leading to decreased reliability of the analysis results. In practice, generative deep learning models, such as Generative Adversarial Networks (GANs) and Diffusion Models (DMs), have been proven to generate realistic data and may be used to solve this promblem. In this study, we utilized bulk RNA-Seq gene expression data to construct different generative models with two data preprocessing methods: Min-Max-GAN, Z-Score-GAN, Min-Max-DM, and Z-Score-DM. We demonstrated that the generated data from the Min-Max-GAN model exhibited high similarity to real data, surpassing the performance of the other models significantly. Furthermore, we trained the models on the largest dataset available to date, achieving MMD (Maximum Mean Discrepancy) of 0.030 and 0.033 on the training and independent datasets, respectively. Through SHAP (SHapley Additive exPlanations) explanations of our generative model, we also enhanced our model's credibility. Finally, we applied the generated data to data augmentation and observed a significant improvement in the performance of classification models. In summary, this study establishes a GAN-based approach for generating bulk RNA-Seq gene expression data, which contributes to enhancing the performance and reliability of downstream tasks in high-throughput transcriptome analysis.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.