Structural Properties Underlying High-Quality Randomized Numerical Linear Algebra Algorithms

Michael W Mahoney ,Petros Drineas

doi:10.1201/b19567-13

Abstract

In recent years, the amount of data that has been generated and recorded has grown enormously, and data are now seen to be at the heart of modern economic activity, innovation, and growth. See, for example, the report by the McKinsey Global Institute [51], which identifies ways in which Big Data have transformed the modern world, as well as the report by the National Research Council [19], which discusses reasons for and technical challenges in massive data analysis. In many cases, these so-called Big Data are modeled as matrices, basically since an m × n matrix A provides a natural mathematical structure with which to encode information about m objects, each of which is described by n features. As a result, while linear algebra algorithms have been of interest for decades in areas such as Numerical Linear Algebra (NLA) and scientific computing, in recent years there has been renewed interest in developing matrix algorithms that are appropriate for the analysis of large datasets that are represented in the form of matrices. For example, tools such as the Singular Value Decomposition (SVD) and the related Principal Components Analysis (PCA) [38] permit the low-rank approximation of a matrix, and they have have had a profound impact in diverse areas of science and engineering. They have also been studied extensively in large-scale machine learning and data analysis applications, in settings ranging from web search engines and social network analysis to the analysis of astronomical and biological data. Importantly, the structural and noise properties of matrices that arise in machine learning and data analysis applications are typically very different than those of matrices that arise in scientific computing and NLA. This has led researchers to revisit traditional problems in light of new requirements and to consider novel algorithmic approaches to many traditional matrix problems. One of the more remarkable trends in recent years is a new paradigm that arose in Theoretical Computer Science (TCS) and that involves the use of randomization as a computational resource for the design and analysis of algorithms for fundamental matrix problems. Randomized Numerical Linear Algebra (RandNLA) is the interdisciplinary research area that exploits randomization as a computational resource to develop improved algorithms for large-scale linear algebra problems, e.g., matrix multiplication, linear regression, low-rank matrix approximation, etc. [49]. In this chapter, we will discuss RandNLA, with an emphasis on highlighting how many of the most

Full Text