BackgroundThe minimum sample size for multistakeholder Delphi surveys remains understudied. Drawing from three large international multistakeholder Delphi surveys, this study aimed to: 1) investigate the effect of increasing sample size on replicability of results; 2) assess whether the level of replicability of results differed with participant characteristics: e.g., gender, age, profession. MethodsWe used data from Delphi surveys to develop guidance for improved reporting of healthcare intervention trials: SPIRIT (Standard Protocol Items: Recommendations for Interventional Trials) and CONSORT (Consolidated Standards of Reporting Trials) extension for surrogate endpoints (n=175, 22 items rated); CONSORT-SPI, extension for social and psychological interventions (n=333, 77 items rated); and core outcome set for burn care (n=553, 88 items rated). Resampling with replacement was used to draw random subsamples from the participant data set in each of the three surveys. For each subsample, the median value of all rated survey items was calculated and compared to the medians from the full participant data set. The median number (and interquartile range) of medians replicated was used to calculate the percentage replicability (and variability). High replicability was defined as ≥80% and moderate as 60% and <80% ResultsThe average median replicability (variability) as a percentage of total number of items rated from the three datasets was 81% (10%) at a sample size of 60. In one of the datasets (CONSORT-SPI), a ≥80% replicability was reached at a sample size of 80. On average, increasing the sample size from 80 to 160 increased the replicability of results by a further 3% and reduced variability by 1%. For subgroup analysis based on participant characteristics (e.g. gender, age, professional role), using resampled samples of 20 to 100 showed that a sample size of 20 to 30 resulted to moderate replicability levels of 64 to 77%. ConclusionWe found that a minimum sample size of 60 to 80 participants in multistakeholder Delphi surveys provide a high level of replicability (≥80%) in the results. For Delphi studies limited to individual stakeholder groups (such as researchers, clinicians, patients), a sample size of 20 to 30 per group may be sufficient.