Abstract

A purely data-based index for detecting bivariate association is proposed for preliminary data exploration when seeking to model a dependent variable, associated with a possibly large number of independent variables. No particular form of association between the dependent and independent variables is assumed. The proposed bivariate association index is the value p, which is the probability that a scatter plot created by an X-randomization will generate a smaller mean nearest neighbour distance. The rationale is that randomizing an existing X-Y association will result in a scatter plot which will usually have a greater mean nearest neighbour distance. The process is then repeated for all other independent variables to give a specific p for each one. A subset of potentially informative independent variables is then obtained by noting all those with low p values, but just how small p should be is left to the user.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call