Abstract

The management of prostate cancer (PCa) that has been diagnosed after PSA testing relies on risk stratification. Treatments have significant urinary, digestive and sexual toxicities that can impact the patients’ quality of life. Many patients do not benefit from treatment because their cancer is indolent or because they die from other competitive causes. New models are needed to stratify patients at diagnosis and determine the best therapeutic strategy. Machine learning techniques can provide performant solutions to guide decision-making. We used data from the prospective clinical trial PLCO and selected patients who were diagnosed with PCa during follow-up. To assess the predictive power of a simple set of questions as a baseline indicator of current clinical practice, we selected features from the dataset that were relevant to prostate cancer diagnosis, medical history, physical activity and socio-economic status of patients. These features include: (1) Prostate cancer diagnosis: PSA, T, N, M stage, Gleason score and initial primary treatment (if performed) (2) Medical history: age, height, weight, current smoking status, smoking pack-years, daily alcohol consumption, history of prostatitis, nocturia, arthritis, bronchitis, diabetes, emphysema, heart attack, hypertension, liver disease, osteoporosis, stroke, cholesterol. (3) Physical activity: activity at least once a month during the last year, physical activity at work (4) Socio-economic status: family income, education (5) Hormonal status: hair pattern at 45 y.o., weight gain pattern We trained two gradient-boosting models to predict 10-year cancer-specific (CSS) and overall survival (OS) with these features. Hyperparameters were selected on the training dataset, in a nested, cross-validated, with Bayesian Optimization. To assess the performances of the models, we used the non-parametric bootstrap procedure with 200 splits. Understanding the predictions of the models is very important in this setting: we need to know whether the prediction relies on the aggressivity of the PCa or on any other comorbidities. To do so, we used shapley values to provide explanations for predictions at the population and individual level. During follow-up, 8,776 patients were diagnosed with PCa. The dataset was split into a training (n = 7,021) and a testing (n = 1,755) dataset. Accuracy was 0.87 (± 0.02) and 0.98 (± 0.01) for OS and CSS respectively. The area under the receiver operating characteristic was 0.84 (± 0.02) and 0.81 (± 0.04) and the area under precision-recall was 0.6 (± 0.03) and 0.55 (± 0.07) for OS and CSS respectively. The models also provide an explanation of their prediction. They are deployed online at http://prostatecancersurvival.stanford.edu. Using prospective data, we trained two models to predict 10-year CSS and OS with high accuracy. These models provide interpretable predictions to support informed decision-making in PCa treatment.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call