Power enough? Confidence intervals for uncertainty

Paul G Mcdonough

doi:10.1016/s0015-0282(02)04935-x

Abstract

The study by Garcia-Velasco and colleagues is an important addition to discussions of the relative merits of “blind” versus ultrasonography-guided embryo transfer. The group in back of this study headed by Antonio Pellicer and Carlos Simon is now located in Madrid. One can be certain, with narrow confidence intervals, that this study will be frequently cited in the future literature on this topic. However, both Doctors Meldrum and Sallam caution as to the dangers of misinterpretation of nonsignificant results, when the analysis of the sample data is unable to document the difference statistically. It is obvious that one of the difficulties of clinical testing is defining the study power that is necessary to detect real, clinically worthwhile differences. With a β of 0.20, an effect size of 15%, and a corresponding power of 80%, the authors have taken a reasonable protection against a type II error. In his letter, Doctor Sallam suggests that the sample size estimates were overly optimistic in expecting to have an 80% probability of detecting a difference of 15% or greater between the two groups. The quantitative boundary designed to establish superiority seems to be large. One wonders if an effect size of 15% with a possible overall success rate of 65% is beyond the inherent efficiency of most assisted reproductive programs. A smaller effect size would require more patients and may have influenced the prestudy power calculations. It is normal for researchers to feel uncertain about advance power calculations because the best estimates of sample and effect size can be upset by within- or between-patient variation, recruitment patterns, and compliance. In spite of the need for more flexibility in sample size adjustment using interim data, the power calculation does provide one important function: It ensures that the principal outcome measure for the trial is clearly defined (clinical pregnancy rates/fresh embryo transfers). This is an important safeguard against claiming a positive or negative effect on an outcome that was not prespecified. For this reason, the post hoc subgroup comparisons across four different operators and their effect on pregnancy rates are interesting to read, but further hypothesis testing is necessary. Otherwise it is like betting on a horse after the race is over. At this point, there is little merit in calculating the statistical power (post hoc power) of the Garcia-Velasco study, now that the results of the trial are known (1). The power or uncertainty of the results is more appropriately indicated by an odds ratio with confidence intervals. The confidence interval in this study will include 1, but provision of the confidence intervals enables the reader to see clearly the coverage of the point estimates. Wide confidence intervals indicate uncertainty. The use of confidence intervals is particularly helpful for the reader when interpreting “negative results” and are preferable to post hoc power calculations.

Full Text