ObjectivesTo create a machine learning predictive model combining PI-RADS score, PSA density, and clinical variables to predict clinically significant prostate cancer (csPCa). MethodsWe evaluated a cohort of patients who underwent prostate biopsy for suspected prostate cancer (PCa) in New Zealand, Australia, and Switzerland. We collected data on age, body mass index (BMI), PSA level, prostate volume, PSA density (PSAD), PI-RADS scores, previous biopsy, and corresponding histology results. The dataset was divided into derivation (training) and validation (test) sets using random splits. An independent dataset was obtained from the Harvard Dataverse for external validation. A cohort of 1272 patients was analyzed. We fitted a Lasso model, XGBoost, and LightGBM to the training set and assessed their accuracy. ResultsAll models demonstrated ROC AUC values ranging from 0.830 to 0.851. LightGBM was considered the superior model, with an ROC of 0.851 [95%CI: 0.804 – 0.897] in the test set and 0.818 [95% CI: 0.798 – 0.831] in the external dataset. The most important variable was PI-RADS, followed by PSA density, history of previous biopsy, age, and BMI. ConclusionsWe developed a predictive model for detecting csPCa that exhibited a high ROC-AUC value for internal and external validations. This suggests that the integration of the clinical parameters outperformed each individual predictor. Additionally, the model demonstrated good calibration metrics, indicative of a more balanced model than the existing models.
Read full abstract