Abstract

BackgroundWe suggest an adaptive sample size calculation method for developing clinical prediction models, in which model performance is monitored sequentially as new data comes in.MethodsWe illustrate the approach using data for the diagnosis of ovarian cancer (n = 5914, 33% event fraction) and obstructive coronary artery disease (CAD; n = 4888, 44% event fraction). We used logistic regression to develop a prediction model consisting only of a priori selected predictors and assumed linear relations for continuous predictors. We mimicked prospective patient recruitment by developing the model on 100 randomly selected patients, and we used bootstrapping to internally validate the model. We sequentially added 50 random new patients until we reached a sample size of 3000 and re-estimated model performance at each step. We examined the required sample size for satisfying the following stopping rule: obtaining a calibration slope ≥ 0.9 and optimism in the c-statistic (or AUC) < = 0.02 at two consecutive sample sizes. This procedure was repeated 500 times. We also investigated the impact of alternative modeling strategies: modeling nonlinear relations for continuous predictors and correcting for bias on the model estimates (Firth’s correction).ResultsBetter discrimination was achieved in the ovarian cancer data (c-statistic 0.9 with 7 predictors) than in the CAD data (c-statistic 0.7 with 11 predictors). Adequate calibration and limited optimism in discrimination was achieved after a median of 450 patients (interquartile range 450–500) for the ovarian cancer data (22 events per parameter (EPP), 20–24) and 850 patients (750–900) for the CAD data (33 EPP, 30–35). A stricter criterion, requiring AUC optimism < = 0.01, was met with a median of 500 (23 EPP) and 1500 (59 EPP) patients, respectively. These sample sizes were much higher than the well-known 10 EPP rule of thumb and slightly higher than a recently published fixed sample size calculation method by Riley et al. Higher sample sizes were required when nonlinear relationships were modeled, and lower sample sizes when Firth’s correction was used.ConclusionsAdaptive sample size determination can be a useful supplement to fixed a priori sample size calculations, because it allows to tailor the sample size to the specific prediction modeling context in a dynamic fashion.

Highlights

  • We suggest an adaptive sample size calculation method for developing clinical prediction models, in which model performance is monitored sequentially as new data comes in

  • These rules refer to events per considered model coefficient in a regression analysis, it is sometimes incorrectly interpreted in terms of events per variable in the final model

  • We aimed to extend a priori fixed sample size calculations with an adaptive approach that dynamically learns from model performance as new data comes in

Read more

Summary

Introduction

We suggest an adaptive sample size calculation method for developing clinical prediction models, in which model performance is monitored sequentially as new data comes in. Clinical prediction models, such as diagnostic and prognostic models, are ubiquitous in the literature [1,2,3]. A well-known rule of thumb is to have a minimum of 10 events per variable (EPV) in the smallest outcome group [4], EPV > 20 has been suggested [5] These rules refer to events per considered model coefficient (excluding intercept) in a regression analysis, it is sometimes incorrectly interpreted in terms of events per variable in the final model (i.e., excluding variables eliminated by any data-driven variable selection procedure). We will use the term “events per candidate predictor parameter” (EPP) instead, in line with a recent publication [6]

Objectives
Methods
Results
Discussion
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.