The meta-analysis reported in Vahey et al. (2015) concluded that the Implicit Relational Assessment Procedure (IRAP) has high clinical criterion validity (meta-analytic r‾=.45) and therefore "the potential of the IRAP as a tool for clinical assessment" (p. 64). Vahey et al. (2015) also reported power analyses, and the article is frequently cited for sample size determination in IRAP studies, especially their heuristic of N>37. This article attempts to verify those results. Results were found to have very poor reproducibility at almost every stage of the data extraction and analysis with errors generally biased towards inflating the effect size. The reported meta-analysis results were found to be mathematically implausible and could not be reproduced despite numerous attempts. Multiple internal discrepancies were found in the effect sizes such as between the forest plot and funnel plot, and between the forest plot and the supplementary data. 23 of the 56 (41.1%) individual effect sizes were not actually criterion effects and did not meet the original inclusion criteria. The original results were also undermined by combining effect sizes with different estimands. Reextraction of effect sizes from the original articles revealed 360 additional effect sizes that met inclusion criteria that should have been included in the original analysis. Examples of selection bias in the inclusion of larger effect sizes were observed. A new meta-analysis was calculated to understand the compound impact of these errors (i.e., without endorsing its results as a valid estimate of the IRAP's criterion validity). The effect size was half the size of the original (r‾=.22), and the power analyses recommended sample sizes nearly 10 times larger than the original (N>346), which no published original study using the IRAP has met. In aggregate, this seriously undermines the credibility and utility of the original article's conclusions and recommendations. Vahey et al. (2015) appears to need substantial correction at minimum. In particular, researchers should not rely on its results for sample size justification. A list of suggestions for error detection in meta-analyses is provided.
Read full abstract