Sparse canonical correlation analysis from a predictive point of view.

Ines Wilms,Christophe Croux

doi:10.1002/bimj.201400226

Abstract

Canonical correlation analysis (CCA) describes the associations between two sets of variables by maximizing the correlation between linear combinations of the variables in each dataset. However, in high-dimensional settings where the number of variables exceeds the sample size or when the variables are highly correlated, traditional CCA is no longer appropriate. This paper proposes a method for sparse CCA. Sparse estimation produces linear combinations of only a subset of variables from each dataset, thereby increasing the interpretability of the canonical variates. We consider the CCA problem from a predictive point of view and recast it into a regression framework. By combining an alternating regression approach together with a lasso penalty, we induce sparsity in the canonical vectors. We compare the performance with other sparse CCA techniques in different simulation settings and illustrate its usefulness on a genomic dataset.

Full Text