Abstract

Accurate prognosis of patients with cancer is important for the stratification of patients, the optimization of treatment strategies, and the design of clinical trials. Both clinical features and molecular data can be used for this purpose, for instance, to predict the survival of patients censored at specific time points. Multi-omics data, including genome-wide gene expression, methylation, protein expression, copy number alteration, and somatic mutation data, are becoming increasingly common in cancer studies. To harness the rich information in multi-omics data, we developed GDP (Group lass regularized Deep learning for cancer Prognosis), a computational tool for survival prediction using both clinical and multi-omics data. GDP integrated a deep learning framework and Cox proportional hazard model (CPH) together, and applied group lasso regularization to incorporate gene-level group prior knowledge into the model training process. We evaluated its performance in both simulated and real data from The Cancer Genome Atlas (TCGA) project. In simulated data, our results supported the importance of group prior information in the regularization of the model. Compared to the standard lasso regularization, we showed that group lasso achieved higher prediction accuracy when the group prior knowledge was provided. We also found that GDP performed better than CPH for complex survival data. Furthermore, analysis on real data demonstrated that GDP performed favorably against other methods in several cancers with large-scale omics data sets, such as glioblastoma multiforme, kidney renal clear cell carcinoma, and bladder urothelial carcinoma. In summary, we demonstrated that GDP is a powerful tool for prognosis of patients with cancer, especially when large-scale molecular features are available.

Highlights

  • Survival analysis, which models time-to-event outcomes, has been widely adopted in cancer studies, for example the docetaxel chemotherapy study for prostate cancer [1], pertuzumab effects on breast cancer therapies [2], and immunoscore on colorectal cancer patient survival [3]

  • Genes 2019, 10, 240 feature of survival analysis is that part of the observed data is censored, in which the expected event did not happen to the cancer patients at the end of the study or the patients were not followed up on [4]

  • The whole GDP model contained three components: the first one was a fully connected deep learning framework with two hidden-layers, the second one was the Cox proportional hazard model (CPH) module connected to the output of the first part, and the third one was the group lasso regularization method applied to regularize the coefficients of the input layer of the neural network

Read more

Summary

Introduction

Survival analysis, which models time-to-event outcomes, has been widely adopted in cancer studies, for example the docetaxel chemotherapy study for prostate cancer [1], pertuzumab effects on breast cancer therapies [2], and immunoscore on colorectal cancer patient survival [3]. SurvivalNet, combined with Bayesian optimization methods, had been applied to high-dimensional survival predictions in cancer [24] Those studies did not consider the group prior knowledge in the molecular features for the survival analysis in cancer. We proposed a new integrated method and provided an open-source python package named GDP (Group lass regularized Deep learning for cancer Prognosis) for cancer survival analysis by taking advantage of the gene-level group prior knowledge. The GDP integrated group lasso regularization method, tensorflow [25] based deep learning framework, and CPH model were used to analyze partially censored cancer survival data. It shows higher accuracy compared to the lasso method for the input with group prior knowledge in both simulated and real cancer survival data

Data Collection
TCGA Data Preprocessing
Data Simulation
GDP Model
Model Training
Model Evaluation and Feature Selection
Availabilities of Software
Group Lasso Prevents Overfitting During GDP Training
Influence of Group Size on the Performance of GDP Survival Prediction
GDP Performed Better than CPH under Complex Simulations
Comparison
GDP Performances on TCGA Cancer Data
Discussions
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call