Abstract

Principal component analysis (PCA) is widely used for data reduction in group independent component analysis (ICA) of fMRI data. Commonly, group-level PCA of temporally concatenated datasets is computed prior to ICA of the group principal components. This work focuses on reducing very high dimensional temporally concatenated datasets into its group PCA space. Existing randomized PCA methods can determine the PCA subspace with minimal memory requirements and, thus, are ideal for solving large PCA problems. Since the number of dataloads is not typically optimized, we extend one of these methods to compute PCA of very large datasets with a minimal number of dataloads. This method is coined multi power iteration (MPOWIT). The key idea behind MPOWIT is to estimate a subspace larger than the desired one, while checking for convergence of only the smaller subset of interest. The number of iterations is reduced considerably (as well as the number of dataloads), accelerating convergence without loss of accuracy. More importantly, in the proposed implementation of MPOWIT, the memory required for successful recovery of the group principal components becomes independent of the number of subjects analyzed. Highly efficient subsampled eigenvalue decomposition techniques are also introduced, furnishing excellent PCA subspace approximations that can be used for intelligent initialization of randomized methods such as MPOWIT. Together, these developments enable efficient estimation of accurate principal components, as we illustrate by solving a 1600-subject group-level PCA of fMRI with standard acquisition parameters, on a regular desktop computer with only 4 GB RAM, in just a few hours. MPOWIT is also highly scalable and could realistically solve group-level PCA of fMRI on thousands of subjects, or more, using standard hardware, limited only by time, not memory. Also, the MPOWIT algorithm is highly parallelizable, which would enable fast, distributed implementations ideal for big data analysis. Implications to other methods such as expectation maximization PCA (EM PCA) are also presented. Based on our results, general recommendations for efficient application of PCA methods are given according to problem size and available computational resources. MPOWIT and all other methods discussed here are implemented and readily available in the open source GIFT software.

Highlights

  • Principal component analysis (PCA) is used as both a data reduction and de-noising method in group independent component analysis (ICA) (Calhoun et al, 2001; Beckmann and Smith, 2004; Calhoun and Adali, 2012)

  • We show how to overcome the problem of slow convergence in subspace iteration when a high number of components is estimated by introducing a new approach, named multi power iteration (MPOWIT)

  • Our approach takes into account the number of dataloads, which has often been overlooked in the development of randomized PCA methods. We show that both subspace iteration and expectation maximization PCA (EM PCA) methods converge to the same subspace in each iteration

Read more

Summary

Introduction

Principal component analysis (PCA) is used as both a data reduction and de-noising method in group independent component analysis (ICA) (Calhoun et al, 2001; Beckmann and Smith, 2004; Calhoun and Adali, 2012). PCA is typically carried out by computing the eigenvalue decomposition (EVD) of the sample covariance matrix (C) or by using singular value decomposition (SVD) directly on the data For large datasets, both EVD (plus computation of C) and SVD become computationally intensive in both memory and speed. The computational requirements can quickly become prohibitive, especially with the constant advance of imaging techniques [such as multi-band EPI sequences (Feinberg et al, 2010; Feinberg and Setsompop, 2013)] and a tendency to share data within the imaging community This means very large size imaging data will become even more common for fMRI studies, encouraging the development of novel computational methods to face the upcoming challenges

Methods
Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call