Aiming for a representative sample: Simulating random versus purposive strategies for hospital selection.

Loan R Van Hoeven,Hendrik Koffijberg,Mart P Janssen,Kit C B Roes

doi:10.1186/s12874-015-0089-8

Loan R Van Hoeven, Hendrik Koffijberg + Show 2 more

Open Access

https://doi.org/10.1186/s12874-015-0089-8

Copy DOI

Abstract

BackgroundA ubiquitous issue in research is that of selecting a representative sample from the study population. While random sampling strategies are the gold standard, in practice, random sampling of participants is not always feasible nor necessarily the optimal choice. In our case, a selection must be made of 12 hospitals (out of 89 Dutch hospitals in total). With this selection of 12 hospitals, it should be possible to estimate blood use in the remaining hospitals as well. In this paper, we evaluate both random and purposive strategies for the case of estimating blood use in Dutch hospitals.MethodsAvailable population-wide data on hospital blood use and number of hospital beds are used to simulate five sampling strategies: (1) select only the largest hospitals, (2) select the largest and the smallest hospitals (‘maximum variation’), (3) select hospitals randomly, (4) select hospitals from as many different geographic regions as possible, (5) select hospitals from only two regions. Simulations of each strategy result in different selections of hospitals, that are each used to estimate blood use in the remaining hospitals. The estimates are compared to the actual population values; the subsequent prediction errors are used to indicate the quality of the sampling strategy.ResultsThe strategy leading to the lowest prediction error in the case study was maximum variation sampling, followed by random, regional variation and two-region sampling, with sampling the largest hospitals resulting in the worst performance. Maximum variation sampling led to a hospital level prediction error of 15 %, whereas random sampling led to a prediction error of 19 % (95 % CI 17 %-26 %). While lowering the sample size reduced the differences between maximum variation and the random strategies, increasing sample size to n = 18 did not change the ranking of the strategies and led to only slightly better predictions.ConclusionsThe optimal strategy for estimating blood use was maximum variation sampling. When proxy data are available, it is possible to evaluate random and purposive sampling strategies using simulations before the start of the study. The results enable researchers to make a more educated choice of an appropriate sampling strategy.Electronic supplementary materialThe online version of this article (doi:10.1186/s12874-015-0089-8) contains supplementary material, which is available to authorized users.

Highlights

A ubiquitous issue in research is that of selecting a representative sample from the study population
For red blood cell products (RBC), fresh frozen plasma products (FFP) and PLT, maximum variation sampling outperformed largest hospitals sampling in terms of hospital level error (Fig. 2)
maximum variation sampling (MAXVAR) sampling resulted in a 15 % prediction error at the hospital level for RBC, whereas the random strategies had a slightly higher median error for RBC, for FFP (30 %, 29 % and 34 % versus 25 % for MAXVAR), and for PLT (32,31 % and 35 % versus 28 % for MAXVAR)

Summary

Introduction

A ubiquitous issue in research is that of selecting a representative sample from the study population. Always feasible in practice due to constraints in time, resources and costs, and researchers in the medical field often use a ‘convenience’ or a purposive sample, i.e. by choosing participants who are easy to recruit or by selecting participants based on preferences or expectations. The probability of randomly drawing an ‘unrepresentative’ sample is large if your population is small; the estimator is not robust, since data collection is done only once and not a thousand times This can be illustrated by a study in the medical field that compared a randomized study design with a nonrandomized design. In a study estimating drug use characteristics, purposive samples were found to be sufficiently representative, as compared to probabilistic strategies, when these were drawn from a wide crosssection of participants and included a relatively large number of individuals [6]. Non-probabilistic strategies are sufficient at least in some cases

Methods

Results

Conclusion