Abstract

Precise prognostic classification of patients and identifying survival subgroups and their associated genes can be important clinical references when designing treatment strategies for cancer patients. Multi-omics and data integration techniques are powerful tools to achieve this goal. This study aimed to introduce a machine learning method to integrate three types of biological data, and investigate the performance of two other methods, in identifying the survival dependency of patients. The data included TCGA RNA-seq gene expression, DNA methylation, and clinical data from 368 patients with colon cancer also we use an independent external validation data set, containing 232 samples. Three methods including, hyper-parameter optimized autoencoders (HPOAE), normal autoencoder, and penalized principal component analysis (PPCA) were used for simultaneous data integration and estimation under a COX hazards model. The HPOAE was thought to outperform other methods. The HPOAE had the Log Rank Mantel-Cox value of 14.27 ± 2, and a Breslow-Generalized Wilcoxon value of 13.13 ± 1. Ten miRNA, 11 methylated genes, and 28 mRNA all by (importance of marginal cutoff > 0.95) were identified. The study demonstrated that hsa-miR-485-5p targets both ZMYM1 and tp53, the latter of which has been previously associated with cancer in numerous studies. Furthermore, compared to other methods, the HPOAE exhibited a greater capacity for identifying survival subgroups and the genes associated with them in patients with colon cancer. However, all of the results were obtained by computational methods, and clinical and experimental studies are needed to validate these results.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call