Classification techniques have widely been applied to cancer survival prediction for predicting survival or death of patients. However, little attention has been paid to patients who are predicted to die. In this work, we consider survival prediction as a two-stage task, where the first stage is to predict whether the outcome is survival or not, and the second stage is to predict the remaining lifespan for patients whose predicted outcome is death. To this end, we propose a two-stage machine learning model to enhance cancer survival prediction. At the first stage, a tree-based imbalanced ensemble classification method is proposed for classification of the survivability of advanced-stage cancer patients. At the second stage, a selective ensemble regression method is proposed for survival time prediction, where a priori knowledge is adopted for feature selection and the mean proportion of error interval is proposed for selecting base learners. Extensive computational studies performed on colorectal cancer data from SEER database demonstrate that the proposed two-stage model can achieve a more accurate prediction compared to the one-stage regression model. The results show that the proposed classification approach can effectively handle the imbalanced survivability data, and the proposed regression method outperforms several state-of-the-art regression models.
Read full abstract