Integrating multiple omics datasets can significantly advance our understanding of disease mechanisms, physiology, and treatment responses. However, a major challenge in multi-omics studies is the disparity in sample sizes across different datasets, which can introduce bias and reduce statistical power. To address this issue, we propose a novel framework, OmicsNMF, designed to impute missing omics data and enhance disease phenotype prediction. OmicsNMF integrates Generative Adversarial Networks (GANs) with Non-Negative Matrix Factorization (NMF). NMF is a well-established method for uncovering underlying patterns in omics data, while GANs enhance the imputation process by generating realistic data samples. This synergy aims to more effectively address sample size disparity, thereby improving data integration and prediction accuracy. For evaluation, we focused on predicting breast cancer subtypes using the imputed data generated by our proposed framework, OmicsNMF. Our results indicate that OmicsNMF consistently outperforms baseline methods. We further assessed the quality of the imputed data through survival analysis, revealing that the imputed omics profiles provide significant prognostic power for both overall survival and disease-free status. Overall, OmicsNMF effectively leverages GANs and NMF to impute missing samples while preserving key biological features. This approach shows potential for advancing precision oncology by improving data integration and analysis. Source code is available at: https://github.com/compbiolabucf/OmicsNMF.
Read full abstract