Abstract
This chapter first describes the following classical analysis of variance (ANOVA) tests for comparing more than two means: one-way ANOVA, a generalised form of the unpaired t-test (Sect. 3.1); two-way ANOVA without replication, a generalised form of the paired t-test (Sect. 3.2); and two-way ANOVA with replication, which considers the interaction between two factors (e.g. topic and system). The first two types of ANOVA are particularly important for IR researchers, since, in laboratory experiments where systems are evaluated using topics, there is usually one evaluation measure score for a given topic-system pair, (unless, for example, the system is considered to be nondeterministic and produces a different search result page every time the same query is entered) where it is not possible to discuss the topic-system interaction. (Banks et al. (Inf Retr 1:7–34, 1999) applied Tukey’s single-degree-of-freedom test for nonadditivity and Mandel’s bundle-of-line approach to discuss topic-system interaction given two-way ANOVA without replication data from TREC-3 and reported: “there is a strong interaction between system and topic in terms of average precision. The presence of interaction implies that one cannot find simple descriptions of the data in terms of topics and systems alone.” These tests are beyond the scope of this book.) This chapter then describes how one-way ANOVA and two-way ANOVA without replication can easily be conducted using Excel (Sect. 3.4) and R (Sect. 3.5). (For handling other types of ANOVA with R, we refer the readers to Crawley (Statistics: an introduction using R, 2nd edn. Wiley, Chichester, 2015), Chapter 8.) Finally, it describes how a confidence interval for each system can be constructed based on the data from the first two types of ANOVA (Sect. 3.6).
Published Version
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have