Representation-Constrained Canonical Correlation Analysis: A Hybridization of Canonical Correlation and Principal Component Analyses

S K Mishra

doi:10.2139/ssrn.1331886

Abstract

The classical canonical correlation analysis is extremely greedy to maximize the squared correlation between two sets of variables. As a result, if one of the variables in the dataset-1 is very highly correlated with another variable in the dataset-2, the canonical correlation will be very high irrespective of the correlation among the rest of the variables in the two datasets. We intend here to propose an alternative measure of association between two sets of variables that will not permit the greed of a select few variables in the datasets to prevail upon the fellow variables so much as to deprive the latter of contributing to their representative variables or canonical variates. Our proposed Representation-Constrained Canonical correlation (RCCCA) Analysis has the Classical Canonical Correlation Analysis (CCCA) at its one end (λ=0) and the Classical Principal Component Analysis (CPCA) at the other (as λ tends to be very large). In between it gives us a compromise solution. By a proper choice of λ, one can avoid hijacking of the representation issue of two datasets by a lone couple of highly correlated variables across those datasets. This advantage of the RCCCA over the CCCA deserves a serious attention by the researchers using statistical tools for data analysis.

Full Text