Abstract

Prostate cancer is the second cancer diagnosed in males. It accounts for about 4% of cancer-related mortality in men. Several genetic polymorphisms in different genes have been identified that alter the risk of this kind of malignancy. We used the random forest (RF) algorithm for prediction of prostate cancer risk in Iranian population using 13 different single nucleotide polymorphisms (SNPs) in four genes (ANRIL, HOTAIR, IL-6 and IL-8). The samples were divided into a training set (n=320) and a test set (n=80) to evaluate the generalization power for training algorithm. For hyper-parameters tuning, we used randomized search with 5-fold cross-validation for the following hyper-parameters: (1) Number of trees or estimators in the forest (set from 3 to 500); (2) The maximum number of leaf nodes (set from 2 to 32); (3) The maximum number of features used for the best split (set from 5 to 13); and (4) Using bootstrap samples in the trees building (True or False). Accuracy, sensitivity, specificity, and F1-score in both training and test sets were reported. The most important SNP was ANRIL-rs1333048: A/A (Gini index= 0.096) followed by ANRIL- rs10757278: G/G (Gini index= 0.059). Training Dataset Outcomes were as follow: Accuracy: 0.896, Sensitivity: 0.85, Specificity: 0.944 and F1 Score: 0.891. Test Dataset Outcomes were as follow: Accuracy: 0.787, Sensitivity: 0.775, Specificity: 0.800 and F1 Score: 0.784. The AUC Scores were 0.966 and 0.841 for training and test datasets, respectively. The proposed panels of SNPs can predict risk of prostate cancer in Iranian population with appropriate accuracy.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call