Abstract

It is well known that a categorical random variable can be represented geometrically by a simplex. Accordingly, several measures of association between categorical variables have been proposed and discussed in the literature. Moreover, the standard definitions of covariance and correlation coefficient for continuous random variables have been extended to categorical variables. In this article, we present a geometrical framework where both continuous and categorical data are represented by simplices and lines in a high-dimensional space, respectively. We introduce a function whose direct minimization leads to a single definition of covariance between categorical–categorical, categorical–continuous, and continuous–continuous data. The novelty of this general approach is that a single space and a single distance function can be used for describing both continuous and categorical data. It thus provides a unified geometrical description of the measure of association, in particular between categorical and continuous data. We discuss virtues and limitations of such a geometrical framework and provide examples with possible applications to sociological surveys.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call