Abstract

e13555 Background: Survival analysis is used to establish a connection between covariates and the time of event with censored data. Compared with traditional statistical methods, machine learning approaches based on sophisticated and effective computational algorithms are more capable for handling complex multi-dimensional medical data. Methods: We developed an automated machine learning tool MLsurvival to analyze survival data of cancer patients, algorithms of which include the statistical cox regression and machine learning based on linear model (elastic net), ensemble model (gradient boosting with least squares or regression trees and random forest) and support vector kernel (linear and non-linear). The workflow of MLsurvival is comprised with four modules: preprocessing (missing data remove or imputation and feature standardization), feature selection (unsupervised multi-statistics and supervised machine recursive feature elimination with cross-validation), modeling (hyperparameter and performance evaluation) and prediction. To evaluate the performance of this tool, we analyzed medical data for 222 hepatocellular carcinoma (HCC) patients at stage II-III who underwent surgical resection and developed five machine learning approach based estimation models for overall survival (OS). Models were trained on 155 patients with 300 features, including clinical information, somatic mutation and copy number variation, and independently validated on the rest 67 patients. Results: The ensemble model of gradient boosting fitted by MLsurvival using 48 selected features for the data of 155 HCC patients possessed the highest mean AUC and C-Index value. For 67 patients in validation set, this model predicted half year mortality of patients with an AUC of 0.9 (95% CI, 0.771-1.029) and one year mortality with an AUC of 0.897 (95% CI, 0.816-0.978). In addition to that, this model was also predictive for the time of recurrence (pvalue < 0.0001). Furthermore, we also utilized this tool in survival analysis for extensive real data from patients with breast, lung, and esophagus cancers, while most of results showed superior accuracy and stable performance. Conclusions: MLsurvial is an automate tool for survival analysis of cancer patients with well performance. The risk scoring system implemented in this tool offers a novel strategy for incorporating multi-dimensional risk factors to predict clinical outcome, contributes to the better understanding of disease background and helps to optimize the clinical follow-up and therapeutic treatment for cancer patients.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call