Subspace Approximation Research Articles

Untargeted metabolomic studies generate information-rich, high-dimensional, and complex datasets that remain challenging to handle and fully exploit. Despite the remarkable progress in the development of tools and algorithms, the “exhaustive” extraction of information from these metabolomic datasets is still a non-trivial undertaking. A conversation on data mining strategies for a maximal information extraction from metabolomic data is needed. Using a liquid chromatography-mass spectrometry (LC-MS)-based untargeted metabolomic dataset, this study explored the influence of collection parameters in the data pre-processing step, scaling and data transformation on the statistical models generated, and feature selection, thereafter. Data obtained in positive mode generated from a LC-MS-based untargeted metabolomic study (sorghum plants responding dynamically to infection by a fungal pathogen) were used. Raw data were pre-processed with MarkerLynxTM software (Waters Corporation, Manchester, UK). Here, two parameters were varied: the intensity threshold (50–100 counts) and the mass tolerance (0.005–0.01 Da). After the pre-processing, the datasets were imported into SIMCA (Umetrics, Umea, Sweden) for more data cleaning and statistical modeling. In addition, different scaling (unit variance, Pareto, etc.) and data transformation (log and power) methods were explored. The results showed that the pre-processing parameters (or algorithms) influence the output dataset with regard to the number of defined features. Furthermore, the study demonstrates that the pre-treatment of data prior to statistical modeling affects the subspace approximation outcome: e.g., the amount of variation in X-data that the model can explain and predict. The pre-processing and pre-treatment steps subsequently influence the number of statistically significant extracted/selected features (variables). Thus, as informed by the results, to maximize the value of untargeted metabolomic data, understanding of the data structures and exploration of different algorithms and methods (at different steps of the data analysis pipeline) might be the best trade-off, currently, and possibly an epistemological imperative.

Principal component analysis (PCA) is widely used for data reduction in group independent component analysis (ICA) of fMRI data. Commonly, group-level PCA of temporally concatenated datasets is computed prior to ICA of the group principal components. This work focuses on reducing very high dimensional temporally concatenated datasets into its group PCA space. Existing randomized PCA methods can determine the PCA subspace with minimal memory requirements and, thus, are ideal for solving large PCA problems. Since the number of dataloads is not typically optimized, we extend one of these methods to compute PCA of very large datasets with a minimal number of dataloads. This method is coined multi power iteration (MPOWIT). The key idea behind MPOWIT is to estimate a subspace larger than the desired one, while checking for convergence of only the smaller subset of interest. The number of iterations is reduced considerably (as well as the number of dataloads), accelerating convergence without loss of accuracy. More importantly, in the proposed implementation of MPOWIT, the memory required for successful recovery of the group principal components becomes independent of the number of subjects analyzed. Highly efficient subsampled eigenvalue decomposition techniques are also introduced, furnishing excellent PCA subspace approximations that can be used for intelligent initialization of randomized methods such as MPOWIT. Together, these developments enable efficient estimation of accurate principal components, as we illustrate by solving a 1600-subject group-level PCA of fMRI with standard acquisition parameters, on a regular desktop computer with only 4 GB RAM, in just a few hours. MPOWIT is also highly scalable and could realistically solve group-level PCA of fMRI on thousands of subjects, or more, using standard hardware, limited only by time, not memory. Also, the MPOWIT algorithm is highly parallelizable, which would enable fast, distributed implementations ideal for big data analysis. Implications to other methods such as expectation maximization PCA (EM PCA) are also presented. Based on our results, general recommendations for efficient application of PCA methods are given according to problem size and available computational resources. MPOWIT and all other methods discussed here are implemented and readily available in the open source GIFT software.

Subspace Approximation Research Articles

Related Topics

Articles published on Subspace Approximation

Approximating the large sparse matrix exponential using incomplete orthogonalization and Krylov subspaces of variable dimension

An investigation of GPU-based stiff chemical kinetics integration methods

Rational approximation of affine coordinate subspaces of Euclidean space

Rectifiable-Reifenberg and the regularity of stationary and minimizing harmonic maps

An Efficient Reduced Basis Solver for Stochastic Galerkin Matrix Equations

A Hybrid Riemann Solver for Large Hyperbolic Systems of Conservation Laws

Hybrid and Iteratively Reweighted Regularization by Unbiased Predictive Risk and Weighted GCV for Projected Systems

A Conversation on Data Mining Strategies in LC-MS Untargeted Metabolomics: Pre-Processing and Pre-Treatment Steps.

Acceleration of contour integration techniques by rational Krylov subspace methods

Sub-space approximations for MDO problems with disparate disciplinary variable dependence

Krylov Integration Factor Method on Sparse Grids for High Spatial Dimension Convection–Diffusion Equations

A versatile sparse representation based post-processing method for improving image super-resolution

Memory Efficient PCA Methods for Large Group ICA.

Krylov single-step implicit integration factor WENO methods for advection–diffusion–reaction equations

A Krylov semi-implicit discontinuous Galerkin method for the computation of ground and excited states in Bose–Einstein condensates

Error bounds and estimates for Krylov subspace approximations of Stieltjes matrix functions

Robust Environmental Sound Recognition With Fast Noise Suppression for Home Automation

SubspaceEM: A fast maximum-a-posteriori algorithm for cryo-EM single particle reconstruction.

Fast Subspace Approximation Via Greedy Least-Squares

SubPatCNV: approximate subspace pattern mining for mapping copy-number variations.

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Subspace Approximation Research Articles

Related Topics

Articles published on Subspace Approximation

Approximating the large sparse matrix exponential using incomplete orthogonalization and Krylov subspaces of variable dimension

An investigation of GPU-based stiff chemical kinetics integration methods

Rational approximation of affine coordinate subspaces of Euclidean space

Rectifiable-Reifenberg and the regularity of stationary and minimizing harmonic maps

An Efficient Reduced Basis Solver for Stochastic Galerkin Matrix Equations

A Hybrid Riemann Solver for Large Hyperbolic Systems of Conservation Laws

Hybrid and Iteratively Reweighted Regularization by Unbiased Predictive Risk and Weighted GCV for Projected Systems

A Conversation on Data Mining Strategies in LC-MS Untargeted Metabolomics: Pre-Processing and Pre-Treatment Steps.

Acceleration of contour integration techniques by rational Krylov subspace methods

Sub-space approximations for MDO problems with disparate disciplinary variable dependence

Krylov Integration Factor Method on Sparse Grids for High Spatial Dimension Convection–Diffusion Equations

A versatile sparse representation based post-processing method for improving image super-resolution

Memory Efficient PCA Methods for Large Group ICA.

Krylov single-step implicit integration factor WENO methods for advection–diffusion–reaction equations

A Krylov semi-implicit discontinuous Galerkin method for the computation of ground and excited states in Bose–Einstein condensates

Error bounds and estimates for Krylov subspace approximations of Stieltjes matrix functions

Robust Environmental Sound Recognition With Fast Noise Suppression for Home Automation

SubspaceEM: A fast maximum-a-posteriori algorithm for cryo-EM single particle reconstruction.

Fast Subspace Approximation Via Greedy Least-Squares

SubPatCNV: approximate subspace pattern mining for mapping copy-number variations.