Abstract

BackgroundOnline clinical risk prediction tools built on data from multiple cohorts are increasingly being utilized for contemporary doctor-patient decision-making and validation. This report outlines a comprehensive data science strategy for building such tools with application to the Prostate Biopsy Collaborative Group prostate cancer risk prediction tool.MethodsWe created models for high-grade prostate cancer risk using six established risk factors. The data comprised 8492 prostate biopsies collected from ten institutions, 2 in Europe and 8 across North America. We calculated area under the receiver operating characteristic curve (AUC) for discrimination, the Hosmer-Lemeshow test statistic (HLS) for calibration and the clinical net benefit at risk threshold 15%. We implemented several internal cross-validation schemes to assess the influence of modeling method and individual cohort on validation performance.ResultsHigh-grade disease prevalence ranged from 18% in Zurich (1863 biopsies) to 39% in UT Health San Antonio (899 biopsies). Visualization revealed outliers in terms of risk factors, including San Juan VA (51% abnormal digital rectal exam), Durham VA (63% African American), and Zurich (2.8% family history). Exclusion of any cohort did not significantly affect the AUC or HLS, nor did the choice of prediction model (pooled, random-effects, meta-analysis). Excluding the lowest-prevalence Zurich cohort from training sets did not statistically significantly change the validation metrics for any of the individual cohorts, except for Sunnybrook, where the effect on the AUC was minimal. Therefore the final multivariable logistic model was built by pooling the data from all cohorts using logistic regression. Higher prostate-specific antigen and age, abnormal digital rectal exam, African ancestry and a family history of prostate cancer increased risk of high-grade prostate cancer, while a history of a prior negative prostate biopsy decreased risk (all p-values < 0.004).ConclusionsWe have outlined a multi-cohort model-building internal validation strategy for developing globally accessible and scalable risk prediction tools.

Highlights

  • Online clinical risk prediction tools built on data from multiple cohorts are increasingly being utilized for contemporary doctor-patient decision-making and validation

  • Biopsy results, including grade of prostate cancer, were collected along with the pre-biopsy risk factors prostate-specific antigen (PSA), digital rectal exam (DRE), age, African ancestry, firstdegree family history of prostate cancer and whether or not a prior prostate biopsy that was negative for prostate cancer was ever performed

  • Out-of-sample prediction criteria We graded the performance of the risk tools in terms of discrimination, calibration, and net benefit. Metrics assessing these features are best observed as curves dependent on thresholds of the risk for referral to biopsy, as we have reported for the online Prostate Biopsy Collaborative Group (PBCG) risk tool [7]

Read more

Summary

Introduction

Online clinical risk prediction tools built on data from multiple cohorts are increasingly being utilized for contemporary doctor-patient decision-making and validation. As technical, reporting and other changes occurred globally in prostate cancer, such as the systematic increase in the number of biopsy cores to increase detection, there came the need to collect contemporary real time data outside of the screening/prevention trial framework in order to expediently adapt the online risk tools to modern practice [5]. Towards this end the Prostate Biopsy Collaborative Group (PBCG) was formed to prospectively collect the standard risk factors and prostate biopsy outcomes from ten diverse international centers in Europe, North America and its territories [6]. A secondary aim of the PBCG was to find scalable methods for multicohort risk modeling that would enable the addition of cohorts into the future as well as the addition or modification of data from existing cohorts once funding for centralized data processing ceased

Objectives
Methods
Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call