This study aims to develop machine learning (ML)-assisted models for analyzing datasets related to Gleason scores in prostate cancer, conducting statistical analyses on the datasets, and identifying meaningful features. We retrospectively collected data from 717 hormone-sensitive prostate cancer (HSPC) patients at Yunnan Cancer Hospital. Of these, data from 526 patients were used for modeling. Seven auxiliary models were established using Logistic Regression (LR), Support Vector Machine (SVM), Random Forest (RF), Decision Tree (DT), Extreme gradient boosting tree (XGBoost), Adaptive Boosting (Adaboost), and artificial neural network (ANN) based on 21 clinical biochemical indicators and features. Evaluation metrics included accuracy (ACC), precision (PRE), specificity (SPE), sensitivity (SEN) or regression rate(Recall), and f1 score. Evaluation metrics for the models primarily included ACC, PRE, SPE, SEN or Recall, f1 score, and area under the curve(AUC). Evaluation metrics were visualized using confusion matrices and ROC curves. Among the ensemble learning methods, RF, XGBoost, and Adaboost performed the best. RF achieved a training dataset score of 0.769 (95% CI: 0.759—0.835) and a testing dataset score of 0.755 (95% CI: 0.660—0.760) (AUC: 0.786, 95%CI: 0.722—0.803), while XGBoost achieved a training dataset score of 0.755 (95% CI: 95%CI: 0.711—0.809) and a testing dataset score of 0.745 (95% CI: 0.660—0.764) (AUC: 0.777, 95% CI: 0.726—0.798). Adaboost scored 0.789 on the training dataset (95% CI: 0.782—0.857) and 0.774 on the testing dataset (95% CI: 0.651—0.774) (AUC: 0.799, 95% CI: 0.703—0.802). In terms of feature importance (FI) in ensemble learning, Bone metastases at first visit, prostatic volume, age, and T1-T2 have significant proportions in RF’s FI. fPSA, TPSA, and tumor burden have significant proportions in Adaboost’s FI, while f/TPSA, LDH, and testosterone have the highest proportions in XGBoost. Our findings indicate that ensemble learning methods demonstrate good performance in classifying HSPC patient data, with TNM staging and fPSA being important classification indicators. These discoveries provide valuable references for distinguishing different Gleason scores, facilitating more accurate patient assessments and personalized treatment plans.
Read full abstract