Survival stratification for colorectal cancer via multi-omics integration using an autoencoder-based model.

Hu Song,Jun Song,Teng Xu,Tao Jiang,Ruizhi Fan,Meng Cao,Chengwei Ruan,Yixin Xu

doi:10.1177/15353702211065010

Abstract

Prognosis stratification in colorectal cancer helps to address cancer heterogeneity and contributes to the improvement of tailored treatments for colorectal cancer patients. In this study, an autoencoder-based model was implemented to predict the prognosis of colorectal cancer via the integration of multi-omics data. DNA methylation, RNA-seq, and miRNA-seq data from The Cancer Genome Atlas (TCGA) database were integrated as input for the autoencoder, and 175 transformed features were produced. The survival-related features were used to cluster the samples using k-means clustering. The autoencoder-based strategy was compared to the principal component analysis (PCA)-, t-distributed random neighbor embedded (t-SNE)-, non-negative matrix factorization (NMF)-, or individual Cox proportional hazards (Cox-PH)-based strategies. Using the 175 transformed features, tumor samples were clustered into two groups (G1 and G2) with significantly different survival rates. The autoencoder-based strategy performed better at identifying survival-related features than the other transformation strategies. Further, the two survival groups were robustly validated using "hold-out" validation and five validation cohorts. Gene expression profiles, miRNA profiles, DNA methylation, and signaling pathway profiles varied from the poor prognosis group (G2) to the good prognosis group (G1). miRNA-mRNA networks were constructed using six differentially expressed miRNAs (let-7c, mir-34c, mir-133b, let-7e, mir-144, and mir-106a) and 19 predicted target genes. The autoencoder-based computational framework could distinguish good prognosis samples from bad prognosis samples and facilitate a better understanding of the molecular biology of colorectal cancer.

Full Text