The influence of base rates on correlations: An evaluation of proposed alternative effect sizes with real-world data.

Kelly M Babchishin,Leslie-Maaike Helmus

doi:10.3758/s13428-015-0627-7

Kelly M Babchishin, Leslie-Maaike Helmus

Open Access

https://doi.org/10.3758/s13428-015-0627-7

Copy DOI

Abstract

Correlations are the simplest and most commonly understood effect size statistic in psychology. The purpose of the current paper was to use a large sample of real-world data (109 correlations with 60,415 participants) to illustrate the base rate dependence of correlations when applied to dichotomous or ordinal data. Specifically, we examined the influence of the base rate on different effect size metrics. Correlations decreased when the dichotomous variable did not have a 50% base rate. The higher the deviation from a 50% base rate, the smaller the observed Pearson's point-biserial and Kendall's tau correlation coefficients. In contrast, the relationship between base rate deviations and the more commonly proposed alternatives (i.e., polychoric correlation coefficients, AUCs, Pearson/Thorndike adjusted correlations, and Cohen's d) were less remarkable, with AUCs being most robust to attenuation due to base rates.In other words, the base rate makes a marked difference in the magnitude of the correlation. As such, when using dichotomous data, the correlation may be more sensitive to base rates than is optimal for the researcher's goals. Given the magnitude of the association between the base rate and point-biserial correlations (r = -.81) and Kendall's tau (r = -.80), we recommend that AUCs, Pearson/Thorndike adjusted correlations, Cohen's d, or polychoric correlations should be considered as alternate effect size statistics in many contexts.

Full Text