Omitting correlated variables

L Jenkins,M Anderson

doi:10.5784/18-0-183

Abstract

Data collected on the physical, biological or man-made world are often highly correlated, posing the question of whether fewer variables would contain almost as much information. A crude solution is simply to look at the Pearson correlation matrix and omit one of a pair of highly correlated variables. A more systematic method is to condition on one or more variables, and observe the resulting partial covariance matrix. If the variables have little variance after the conditioning, then the conditioning variables contain most of the information of all the original variables. Paralleling the usual tests applied in judging how many principal components are sufficient to represent all the data, we can use the amount of variance explained by the conditioning variable (s), as a measure of information content. The paper references earlier work in this area, explains the computation and includes examples using published data sets. The approach is found to be highly competitive with using principal components, and has the obvious advantage over principal components of simply omitting some of the original variables from further consideration. The method has been coded in Visual-Basic add-ins to an Excel spreadsheet.

Highlights

In studying physical and social phenomena, it often happens that two observed variables are highly correlated with one another
Though it is hardly obvious from the correlation matrix for this data, any two of the three variables contains all the information of the three variables. (X3 was calculated by X3 = 0.6X1 - 0.7 X2 + 3.0)
The first principal component can account for 58.72% of the variance of all three variables, and two principal components account for all the variance

Summary

INTRODUCTION

In studying physical and social phenomena, it often happens that two observed variables are highly correlated with one another. A common approach is to use the multivariate method of principal components, or the extension of this into factor analysis This technique does not address directly the basic question of whether all the original variables yield much more information than just some sub-set of them. In this paper we present a statistical method that measures the amount of information lost by omitting one or more variables from a set of correlated observations, and thereby identifies which variables are best retained. This is primarily an ex-post analysis when we are interested in reducing the total number of variables to allow the underlying phenomena to be understood more .

PREVIOUS RESEARCH

NOTATION AND PREPARATION

SELECTING VARIABLES BASED ON PARTIAL COVARIANCE

COMPUTATION

PRINCIPAL COMPONENTS ANALYSIS

ILLUSTRATIVE RESULTS

EXTENSIONS

Full Text

Published version (

Free)

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: ORiON	Publication Date: Jan 1, 2014
Citations: 1	License type: cc-by

R Discovery Prime

R Discovery Prime

Omitting correlated variables

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: ORiON

Lead the way for us

Similar Papers

Information content and complexity of simulated soil water fluxes
Yakov Pachepsky ... Ralph Cady
Geoderma | VOL. 134
Yakov Pachepsky, et. al.Yakov Pachepsky ... Ralph Cady
19 Apr 2006
Geoderma | VOL. 134

The effects of concurrent picture presentations on retelling of orally presented stories by adults with aphasia
Patrick J Doyles ... Amy P Lustig
Aphasiology | VOL. 12
Patrick J Doyles, et. al.Patrick J Doyles ... Amy P Lustig
01 Jul 1998
Aphasiology | VOL. 12

Differential Entropy As a Measure of Information Content in Axiomatic Design
Ebad Jahangir ... Dan Frey
-
Ebad Jahangir, et. al.Ebad Jahangir ... Dan Frey
12 Sep 1999
12 Sep 1999

Effective Complexity as a Measure of Information Content
James W Mcallister
Philosophy of science | VOL. 70
James W McallisterJames W Mcallister
01 Apr 2003
Philosophy of science | VOL. 70

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Omitting correlated variables

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: ORiON