Abstract

SUMMARY Given a parametric model of dependence between two random quantities, X and Y, the notion of information gain can be used to define a measure of correlation. This definition of correlation generalizes both the usual product-moment correlation coeffi- cient for the bivariate normal model and the multiple correlation coefficient in the standard linear regression model. The use of this information-based correlation in a descriptive statistical analysis is examined and several examples are given. If the dependence between two random quantities, X and Y, is modelled parametri- cally, then the concept of information gain can be used to define a measure of correlation. This correlation coefficient can appear in two possible contexts, depending on whether one models the joint distribution of X and Y, or just the conditional distribution of Y given X. An important motivating feature for this information-based correlation coefficient is the fact that it generalizes both the usual product-moment correlation coefficient for the bivariate normal model and the usual multiple correlation coefficient for the standard multiple regression model with normal errors. Our intuition is well developed for these usual correlation coefficients, and hopefully our intuition will still be applicable for-the information-based correlation in more general modelling situations of parametric dependence. Further, since our correlation coefficient is based on information gain, we might hope to extend our intuition to interpret the information gain in any statistical modelling situation where we want to assess how much better a more complicated model is than a simpler model. The concept of information gain for general statistical models is described in ? 2. This concept is then used to define an information-based measure of correlation; the joint case is covered in ? 3 and the conditional case in ? 4. Estimation of the correlation coefficient is carried out by estimating the corresponding information gain; see ?? 5-7. The use of information gain for the purpose of model choice in a descriptive statistical analysis is discussed in ? 8 and a comparison between our approach and Akaike's information criterion is given in ?9. Some examples of the use of this information-based correlation are given in ?? 10 and 11.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call