Glioblastoma multiforme (GM) is a malignant tumor of the central nervous system considered to be highly aggressive and often carrying a terrible survival prognosis. An accurate prognosis is therefore pivotal for deciding a good treatment plan for patients. In this context, computational intelligence applied to data of electronic health records (EHRs) of patients diagnosed with this disease can be useful to predict the patients’ survival time. In this study, we evaluated different machine learning models to predict survival time in patients suffering from glioblastoma and further investigated which features were the most predictive for survival time. We applied our computational methods to three different independent open datasets of EHRs of patients with glioblastoma: the Shieh dataset of 84 patients, the Berendsen dataset of 647 patients, and the Lammer dataset of 60 patients. Our survival time prediction techniques obtained concordance index (C-index) = 0.583 in the Shieh dataset, C-index = 0.776 in the Berendsen dataset, and C-index = 0.64 in the Lammer dataset, as best results in each dataset. Since the original studies regarding the three datasets analyzed here did not provide insights about the most predictive clinical features for survival time, we investigated the feature importance among these datasets. To this end, we then utilized Random Survival Forests, which is a decision tree-based algorithm able to model non-linear interaction between different features and might be able to better capture the highly complex clinical and genetic status of these patients. Our discoveries can impact clinical practice, aiding clinicians and patients alike to decide which therapy plan is best suited for their unique clinical status.
Read full abstract