Machine learning approach for classification of prostate cancer based on clinical biomarkers

Onural Özhan,Fatma Hilal Yağin

doi:10.52876/jcs.1221425

Abstract

In this study, it is aimed to classify cancer based on machine learning (ML) and to determine the most important risk factors by using risk factors for prostate cancer patients. Clinical data of 100 patients with prostate cancer were used. A prediction model was created with the random forest (RF) algorithm to classify prostate cancer. The performance of the model was obtained by Monte-Carlo cross validation (MCCV) using balanced subsampling. In each MCCV, two-thirds (2/3) of the samples were used to assess the significance of the feature. In order to evaluate the performance of the model, graph, accuracy, sensitivity, specificity, positive predictive value, negative predictive value, F1-score and Area under the ROC Curve (AUC) criteria including prediction class probabilities and confusion matrix were calculated. When the results were examined, the sensitivity, specificity, positive predictive value, negative predictive value, accuracy, F1-score, and AUC values obtained from the RF model were 0.89, 0.84, 0.77, 0.93, 0.86, 0.83, and 0.88, respectively. Area, perimeter, and texture were the three most important risk factors for differentiating prostate cancer. In conclusion, when the RF algorithm can be successfully predicted prostate cancer. The important risk factors determined by the RF model may contribute to diagnosis, follow-up and treatment researches in prostate cancer patients.

Full Text