Abstract

There are many different proposed procedures for sample size planning for the Wilcoxon‐Mann‐Whitney test at given type‐I and type‐II error rates α and β, respectively. Most methods assume very specific models or types of data to simplify calculations (eg, ordered categorical or metric data, location shift alternatives, etc). We present a unified approach that covers metric data with and without ties, count data, ordered categorical data, and even dichotomous data. For that, we calculate the unknown theoretical quantities such as the variances under the null and relevant alternative hypothesis by considering the following “synthetic data” approach. We evaluate data whose empirical distribution functions match the theoretical distribution functions involved in the computations of the unknown theoretical quantities. Then, well‐known relations for the ranks of the data are used for the calculations.In addition to computing the necessary sample size N for a fixed allocation proportion t = n 1/N, where n 1 is the sample size in the first group and N = n 1 + n 2 is the total sample size, we provide an interval for the optimal allocation rate t, which minimizes the total sample size N. It turns out that, for certain distributions, a balanced design is optimal. We give a characterization of such distributions. Furthermore, we show that the optimal choice of t depends on the ratio of the two variances, which determine the variance of the Wilcoxon‐Mann‐Whitney statistic under the alternative. This is different from an optimal sample size allocation in case of the normal distribution model.

Highlights

  • The comparison of two independent samples is widespread in medicine, the life sciences in general, and other fields of research

  • We simulate how the chosen type-I and type-II error rates affect the value of the optimal allocation rate t

  • In the same way as before, it is possible to construct an interval for the optimal allocation rate t0, which is given by [I1(0), I2], where the lower bound is u1−α∕2σ 2u1−α∕2 σ + u1−β σ2

Read more

Summary

INTRODUCTION

The comparison of two independent samples is widespread in medicine, the life sciences in general, and other fields of research. Bürkner et al showed for symmetric continuous distributions under a location shift model that a balanced design is optimal for the WMW test For general distributions, they observed in simulation studies that, in many situations, the difference between using the optimal t and using a balanced design is negligible. See the chapter “Keeping Observed Data as a Theoretical Distribution” in the work of Puntanen et al for a similar approach in the parametric case We show in which cases more subjects should be allocated to the first or second group We apply this method to several data examples with different types of data and provide power simulations to show that, with the sample size calculated by our method, the simulated power is at least 1 − β. We simulate how the chosen type-I and type-II error rates affect the value of the optimal allocation rate t

SAMPLE SIZE FORMULA
Interval for the optimal design
Optimality of a balanced design
DATA EXAMPLES
Number of seizures in an epilepsy trial
Method
Irritation of the nasal mucosa
Kidney weights
Albumin in urine
SIMULATIONS FOR THE OPTIMAL DESIGN
Findings
DISCUSSION
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call