When dealing with nominal categorical data, it is often desirable to know the degree of association or dependence between the categorical variables. While there is literally no limit to the number of alternative association measures that have been proposed over the years, they all yield greatly varying, contradictory, and unreliable results due to their lack of an important property: value validity. After discussing the value-validity property, this paper introduces a new measure of association (dependence) based on the mean Euclidean distance between probability distributions, one being a distribution under independence. Both the asymmetric form, when one variable can be considered as the explanatory (independent) variable and one as the response (dependent) variable, and the symmetric form of the measure are introduced. Particular emphasis is given to the important 2 × 2 case when each variable has two categories, but the general case of any number of categories is also covered. Besides having the value-validity property, the new measure has all the prerequisites of a good association measure. Comparisons are made with the well-known Goodman–Kruskal lambda and tau measures. Statistical inference procedure for the new measure is also derived and numerical examples are provided.
Read full abstract