Adaptive sample size determination for the development of clinical prediction models

Evangelia Christodoulou,Ben Van Calster,Dirk Timmerman,Maarten Van Smeden,Ewout W Steyerberg,Maria Wanitschek,Michael Edlinger

doi:10.1186/s41512-021-00096-5

Abstract

BackgroundWe suggest an adaptive sample size calculation method for developing clinical prediction models, in which model performance is monitored sequentially as new data comes in.MethodsWe illustrate the approach using data for the diagnosis of ovarian cancer (n = 5914, 33% event fraction) and obstructive coronary artery disease (CAD; n = 4888, 44% event fraction). We used logistic regression to develop a prediction model consisting only of a priori selected predictors and assumed linear relations for continuous predictors. We mimicked prospective patient recruitment by developing the model on 100 randomly selected patients, and we used bootstrapping to internally validate the model. We sequentially added 50 random new patients until we reached a sample size of 3000 and re-estimated model performance at each step. We examined the required sample size for satisfying the following stopping rule: obtaining a calibration slope ≥ 0.9 and optimism in the c-statistic (or AUC) < = 0.02 at two consecutive sample sizes. This procedure was repeated 500 times. We also investigated the impact of alternative modeling strategies: modeling nonlinear relations for continuous predictors and correcting for bias on the model estimates (Firth’s correction).ResultsBetter discrimination was achieved in the ovarian cancer data (c-statistic 0.9 with 7 predictors) than in the CAD data (c-statistic 0.7 with 11 predictors). Adequate calibration and limited optimism in discrimination was achieved after a median of 450 patients (interquartile range 450–500) for the ovarian cancer data (22 events per parameter (EPP), 20–24) and 850 patients (750–900) for the CAD data (33 EPP, 30–35). A stricter criterion, requiring AUC optimism < = 0.01, was met with a median of 500 (23 EPP) and 1500 (59 EPP) patients, respectively. These sample sizes were much higher than the well-known 10 EPP rule of thumb and slightly higher than a recently published fixed sample size calculation method by Riley et al. Higher sample sizes were required when nonlinear relationships were modeled, and lower sample sizes when Firth’s correction was used.ConclusionsAdaptive sample size determination can be a useful supplement to fixed a priori sample size calculations, because it allows to tailor the sample size to the specific prediction modeling context in a dynamic fashion.

Highlights

We suggest an adaptive sample size calculation method for developing clinical prediction models, in which model performance is monitored sequentially as new data comes in
These rules refer to events per considered model coefficient in a regression analysis, it is sometimes incorrectly interpreted in terms of events per variable in the final model
We aimed to extend a priori fixed sample size calculations with an adaptive approach that dynamically learns from model performance as new data comes in

Summary

Introduction

We suggest an adaptive sample size calculation method for developing clinical prediction models, in which model performance is monitored sequentially as new data comes in. Clinical prediction models, such as diagnostic and prognostic models, are ubiquitous in the literature [1,2,3]. A well-known rule of thumb is to have a minimum of 10 events per variable (EPV) in the smallest outcome group [4], EPV > 20 has been suggested [5] These rules refer to events per considered model coefficient (excluding intercept) in a regression analysis, it is sometimes incorrectly interpreted in terms of events per variable in the final model (i.e., excluding variables eliminated by any data-driven variable selection procedure). We will use the term “events per candidate predictor parameter” (EPP) instead, in line with a recent publication [6]

Objectives

Methods

Results

Discussion

Conclusion

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Diagnostic and Prognostic Research	Publication Date: Mar 22, 2021
Citations: 12	License type: open-access

R Discovery Prime

R Discovery Prime

Adaptive sample size determination for the development of clinical prediction models

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Diagnostic and Prognostic Research

Lead the way for us

Similar Papers

Toward a Modern Era in Clinical Prediction: The TRIPOD Statement for Reporting Prediction Models
Navdeep Tangri ... David M Kent
American Journal of Kidney Diseases | VOL. 65
Navdeep Tangri, et. al.Navdeep Tangri ... David M Kent
15 Jan 2015
American Journal of Kidney Diseases | VOL. 65

Prediction models: stepwise development and simultaneous validation is a step back
Georg Heinze ... Ben Van Calster
Journal of Clinical Epidemiology | VOL. 142
Georg Heinze, et. al.Georg Heinze ... Ben Van Calster
01 Aug 2021
Journal of Clinical Epidemiology | VOL. 142

Sample size determination for estimating antibody seroconversion rate under stable malaria transmission intensity.
Nuno Sepúlveda ... Chris Drakeley
Malaria Journal | VOL. 14
Nuno Sepúlveda, et. al.Nuno Sepúlveda ... Chris Drakeley
03 Apr 2015
Malaria Journal | VOL. 14

Sample size calculation for differential expression analysis of RNA-seq data under Poisson distribution.
Yu Shyr ... Chung-I Li
International journal of computational biology and drug design | VOL. 6
Yu Shyr, et. al.Yu Shyr ... Chung-I Li
01 Jan 2013
International journal of computational biology and drug design | VOL. 6

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Adaptive sample size determination for the development of clinical prediction models

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Diagnostic and Prognostic Research