Effect sizes and their variance for AB/BA crossover design studies

Lech Madeyski,Barbara Kitchenham

doi:10.1007/s10664-017-9574-5

Abstract

Vegas et al. IEEE Trans Softw Eng 42(2):120:135 (2016) raised concerns about the use of AB/BA crossover designs in empirical software engineering studies. This paper addresses issues related to calculating standardized effect sizes and their variances that were not addressed by the Vegas et al.’s paper. In a repeated measures design such as an AB/BA crossover design each participant uses each method. There are two major implication of this that have not been discussed in the software engineering literature. Firstly, there are potentially two different standardized mean difference effect sizes that can be calculated, depending on whether the mean difference is standardized by the pooled within groups variance or the within-participants variance. Secondly, as for any estimated parameters and also for the purposes of undertaking meta-analysis, it is necessary to calculate the variance of the standardized mean difference effect sizes (which is not the same as the variance of the study). We present the model underlying the AB/BA crossover design and provide two examples to demonstrate how to construct the two standardized mean difference effect sizes and their variances, both from standard descriptive statistics and from the outputs of statistical software. Finally, we discuss the implication of these issues for reporting and planning software engineering experiments. In particular we consider how researchers should choose between a crossover design or a between groups design.

Highlights

Vegas et al (2016) reported that many software engineering experiments had used AB/BA crossover designs but the reports of the experiments did not use the correct terminology
Participants are split into two groups, and each participant in one group uses technique A first and subsequently uses technique B to perform the same task using materials related to a different software system or component
We provide a discussion of the model underlying the AB/BA crossover design, so that issues connected with the construction of effect sizes and effect size variances can be properly understood

Summary

Introduction

Vegas et al (2016) reported that many software engineering experiments had used AB/BA crossover designs but the reports of the experiments did not use the correct terminology. 17 papers did not take account of participant variability in their analysis (which is the main rationale for using a repeated measures design such as a crossover). Vegas et al explain both the terminology used to describe a crossover design, and how to analyze a crossover design correctly. A repeated measures design is one where an individual participant contributes more than a single outcome value. In the case of a simple AB/BA crossover design, A refers to one software engineering technique, B refers to another and the goal of the design is to determine which technique delivers the better outcome.

Objectives

Findings

Discussion

Conclusion