Abstract

Shapley Additive Explanation (SHAP) values represent a unified approach to interpreting predictions made by complex machine learning (ML) models (Lundberg et al. NIPS 2017). SHAP values have been applied in other fields, demonstrating superior consistency and concordance with human intuition compared to other interpretation approaches. We describe a novel application of SHAP values to the prediction of overall survival (OS) in prostate cancer patients. Patients with non-metastatic prostate cancer, diagnosed from 2004 to 2015, were identified using the National Cancer Database. We specified a priori the model features: age, prostate-specific antigen (PSA), Gleason score, primary Gleason pattern, percent positive cores (PPC), comorbidity score, and clinical T stage. We trained a gradient-boosted regression tree model and applied SHAP values to model predictions for interpretation and feature attribution. After visualization using SHAP values, we used Kaplan-Meier estimates and Cox proportional hazards regression for survival analysis. Open-source libraries (scikit-learn, xgboost, and shap) in Python 3.7 were used for all analyses. We identified 281,466 patients meeting inclusion criteria. We first demonstrated consistency with literature using the example of low PSA, high Gleason prostate cancer, which was recently identified as a unique entity with poor prognosis. SHAP interaction values provide an elegant illustration of the interaction between low PSA and Gleason 9-10 and show that the same interaction does not exist in lower Gleason cancers. We applied this same methodology to the interaction between PPC and Gleason score, yielding new insights. We identified a stronger interaction effect in patients with Gleason 8+ disease compared to patients with Gleason 6-7 disease, particularly with PPC≥50%. Subsequent confirmatory analysis using linear models supported this finding: using Kaplan-Meier estimates, 5-year OS was 87.7% in Gleason 8+ patients with PPC<50% versus 77.2% in patients with PPC≥50% (p<0.001), compared to 89.1% versus 86.0% in Gleason 7 patients (p<0.001). The Cox interaction term between PPC≥50% and Gleason score 8+ was highly significant (p<0.001). Source code, visualizations, and detailed explanations are available at: https://richardjli.github.io/shap. We describe a novel application of SHAP values for modeling and visualizing nonlinear interaction effects in prostate cancer, demonstrating consistency with published literature and capability to generate new insights. We found a significant interaction between Gleason score 8+ and percent positive cores, suggesting that PPC should be incorporated more robustly into the stratification of high-risk patients. This ML-based approach, used in combination with traditional linear models, offers considerable potential to meaningfully improve risk stratification and staging systems.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.