Canonical correlation analysis (CCA) is a correlation analysis technique that is widely used in statistics and the machine-learning community. However, the high complexity involved in the training process lays a heavy burden on the processing units and memory system, making CCA nearly impractical in large-scale data. To overcome this issue, a novel CCA method that tries to carry out analysis on the dataset in the Fourier domain is developed in this article. Appling Fourier transform on the data, we can convert the traditional eigenvector computation of CCA into finding some predefined discriminative Fourier bases that can be learned with only element-wise dot product and sum operations, without complex time-consuming calculations. As the eigenvalues come from the sum of individual sample products, they can be estimated in parallel. Besides, thanks to the data characteristic of pattern repeatability, the eigenvalues can be well estimated with partial samples. Accordingly, a progressive estimate scheme is proposed, in which the eigenvalues are estimated through feeding data batch by batch until the eigenvalues sequence is stable in order. As a result, the proposed method shows its characteristics of extraordinarily fast and memory efficiencies. Furthermore, we extend this idea to the nonlinear kernel and deep models and obtained satisfactory accuracy and extremely fast training time consumption as expected. An extensive discussion on the fast Fourier transform (FFT)-CCA is made in terms of time and memory efficiencies. Experimental results on several large-scale correlation datasets, such as MNIST8M, X-RAY MICROBEAM SPEECH, and Twitter Users Data, demonstrate the superiority of the proposed algorithm over state-of-the-art (SOTA) large-scale CCA methods, as our proposed method achieves almost same accuracy with the training time of our proposed method being 1000 times faster. This makes our proposed models best practice models for dealing with large-scale correlation datasets. The source code is available at https://github.com/Mrxuzhao/FFTCCA.
Read full abstract