Classical Canonical Correlation Analysis Research Articles

BackgroundWith the rapid development of new genetic measurement methods, several types of genetic alterations can be quantified in a high-throughput manner. While the initial focus has been on investigating each data set separately, there is an increasing interest in studying the correlation structure between two or more data sets. Multivariate methods based on Canonical Correlation Analysis (CCA) have been proposed for integrating paired genetic data sets. The high dimensionality of microarray data imposes computational difficulties, which have been addressed for instance by studying the covariance structure of the data, or by reducing the number of variables prior to applying the CCA. In this work, we propose a new method for analyzing high-dimensional paired genetic data sets, which mainly emphasizes the correlation structure and still permits efficient application to very large data sets. The method is implemented by translating a regularized CCA to its dual form, where the computational complexity depends mainly on the number of samples instead of the number of variables. The optimal regularization parameters are chosen by cross-validation. We apply the regularized dual CCA, as well as a classical CCA preceded by a dimension-reducing Principal Components Analysis (PCA), to a paired data set of gene expression changes and copy number alterations in leukemia.ResultsUsing the correlation-maximizing methods, regularized dual CCA and PCA+CCA, we show that without pre-selection of known disease-relevant genes, and without using information about clinical class membership, an exploratory analysis singles out two patient groups, corresponding to well-known leukemia subtypes. Furthermore, the variables showing the highest relevance to the extracted features agree with previous biological knowledge concerning copy number alterations and gene expression changes in these subtypes. Finally, the correlation-maximizing methods are shown to yield results which are more biologically interpretable than those resulting from a covariance-maximizing method, and provide different insight compared to when each variable set is studied separately using PCA.ConclusionsWe conclude that regularized dual CCA as well as PCA+CCA are useful methods for exploratory analysis of paired genetic data sets, and can be efficiently implemented also when the number of variables is very large.

The classical canonical correlation analysis is extremely greedy to maximize the squared correlation between two sets of variables. As a result, if one of the variables in the dataset-1 is very highly correlated with another variable in the dataset-2, the canonical correlation will be very high irrespective of the correlation among the rest of the variables in the two datasets. We intend here to propose an alternative measure of association between two sets of variables that will not permit the greed of a select few variables in the datasets to prevail upon the fellow variables so much as to deprive the latter of contributing to their representative variables or canonical variates. Our proposed Representation-Constrained Canonical correlation (RCCCA) Analysis has the Classical Canonical Correlation Analysis (CCCA) at its one end (λ=0) and the Classical Principal Component Analysis (CPCA) at the other (as λ tends to be very large). In between it gives us a compromise solution. By a proper choice of λ, one can avoid hijacking of the representation issue of two datasets by a lone couple of highly correlated variables across those datasets. This advantage of the RCCCA over the CCCA deserves a serious attention by the researchers using statistical tools for data analysis.

Classical Canonical Correlation Analysis Research Articles

Articles published on Classical Canonical Correlation Analysis

Prediction of East Asian summer precipitation via independent component analysis

Restricted kernel canonical correlation analysis

Integrative analysis of gene expression and copy number alterations using canonical correlation analysis

Climate Prediction by a Hybrid Method with Emphasizing Future Precipitation Change of East Asia

Representation-Constrained Canonical Correlation Analysis: A Hybridization of Canonical Correlation and Principal Component Analyses

Nonlinear measures of association with kernel canonical correlation analysis and applications

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Classical Canonical Correlation Analysis Research Articles

Articles published on Classical Canonical Correlation Analysis

Prediction of East Asian summer precipitation via independent component analysis

Restricted kernel canonical correlation analysis

Integrative analysis of gene expression and copy number alterations using canonical correlation analysis

Climate Prediction by a Hybrid Method with Emphasizing Future Precipitation Change of East Asia

Representation-Constrained Canonical Correlation Analysis: A Hybridization of Canonical Correlation and Principal Component Analyses

Nonlinear measures of association with kernel canonical correlation analysis and applications