Abstract

With the massive incidence of cancer in recent centuries, it is crucial to carefully analyze the recorded information and provide a thought-out plan for patients’ treatment. A prevalent type of cancer among men, which takes many lives annually, is prostate cancer. The widespread use of machine learning methods can be beneficial for alleviating prostate cancer and minimizing the large number of patients who die due to this cancer. In this research, we proposed a hybrid methodology for predicting the survivability of patients suffering from prostate cancer by applying the Factor Analysis of Mixed Data (FAMD) algorithm, along with under-sampling methods for the SEER dataset as the pre-processing step prior to the main models, namely XGBoost, random forest (RF), support vector machine (SVM), and logistic regression (LR) with a cross-validation technique for parameter tuning to predict both binary labeled and multi-class labeled (including other causes of death) cases, which has been rarely investigated in other related studies. The sensitivity analysis was done by cluster centroid as an under-sampling method by which the different proportions of the majority and minority classes were examined for training the binary classification. This strategy showed using different ratios of the binary classes can influence the accuracy of prediction and prevents overfitting. Having evaluated the models by proper criteria, such as G-mean, we realized the XGBoost (86.28%) and SVM (67.81%) models outperformed the others for two and three-class outcomes, respectively. Compared with similar studies, our method successfully separated the patients regarding their mortality status and if they have passed away due to prostate cancer that can be important for clinical decision making or whether medical experts are required to change their treatment strategy.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call