Abstract

Matrix decompositions are fundamental tools in the area of applied mathematics, statistical computing, and machine learning. In particular, low-rank matrix decompositions are vital, and widely used for data analysis, dimensionality reduction, and data compression. Massive datasets, however, pose a computational challenge for traditional algorithms, placing significant constraints on both memory and processing power. Recently, the powerful concept of randomness has been introduced as a strategy to ease the computational load. The essential idea of probabilistic algorithms is to employ some amount of randomness in order to derive a smaller matrix from a high-dimensional data matrix. The smaller matrix is then used to compute the desired low-rank approximation. Such algorithms are shown to be computationally efficient for approximating matrices with low-rank structure. We present the \proglang{R} package rsvd, and provide a tutorial introduction to randomized matrix decompositions. Specifically, randomized routines for the singular value decomposition, (robust) principal component analysis, interpolative decomposition, and CUR decomposition are discussed. Several examples demonstrate the routines, and show the computational advantage over other methods implemented in R.

Highlights

  • In the era of “big data”, vast amounts of data are being collected and curated in the form of arrays across the social, physical, engineering, biological, and ecological sciences

  • The rpca() function provides an efficient routine for computing the dominant principal components using Algorithm 6

  • Li, Ma, and Wright (2011) proved that it is possible to exactly separate such a data matrix A ∈ Rm×n into both its low-rank and sparse components, under rather broad assumptions. This is achieved by solving a convenient convex optimization problem called principal component pursuit (PCP)

Read more

Summary

Introduction

In the era of “big data”, vast amounts of data are being collected and curated in the form of arrays across the social, physical, engineering, biological, and ecological sciences. Analysis of the data relies on a variety of matrix decomposition methods which seek to exploit low-rank features exhibited by the high-dimensional data. Despite our ever-increasing computational power, the emergence of large-scale datasets has severely challenged our ability to analyze data using traditional matrix algorithms. The computationally expensive singular value decomposition (SVD) is the most ubiquitous method for dimensionality reduction, data processing and compression. The concept of randomness has recently been demonstrated as an effective strategy to easing the computational demands of low-rank approximations from matrix decompositions such as the SVD, allowing for a scalable architecture for modern “big data” applications. Throughout this paper, we make the following assumption: the data matrix to be approximated has low-rank structure, i.e., the rank is smaller than the ambient dimension of the measurement space

Randomness as a computational strategy
Motivation and contributions
Organization
Notation
Probabilistic framework for low-rank approximations
The generic randomized algorithm
Improved randomized algorithm
Random test matrices
Randomized singular value decompositions
Brief historical overview
Conceptual overview
Randomized algorithm
Theoretical performance
Existing functionality for SVD in R
SVD example
Computational performance
Randomized principal component analysis
Existing functionality for PCA in R
PCA example
Randomized robust principal component analysis
The inexact augmented Lagrange multiplier method
Existing functionality for robust PCA in R
Robust PCA example
Additional functionality
Randomized CUR decomposition
Randomized interpolative decomposition
Findings
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call