Optimal Low-rank Approximation Research Articles

Low-rankness plays an important role in traditional machine learning but is not so popular in deep learning. Most previous low-rank network compression methods compress networks by approximating pretrained models and retraining. However, the optimal solution in the Euclidean space may be quite different from the one with low-rank constraint. A well-pretrained model is not a good initialization for the model with low-rank constraints. Thus, the performance of a low-rank compressed network degrades significantly. Compared with other network compression methods such as pruning, low-rank methods attract less attention in recent years. In this article, we devise a new training method, low-rank projection with energy transfer (LRPET), that trains low-rank compressed networks from scratch and achieves competitive performance. We propose to alternately perform stochastic gradient descent training and projection of each weight matrix onto the corresponding low-rank manifold. Compared to retraining on the compact model, this enables full utilization of model capacity since solution space is relaxed back to Euclidean space after projection. The matrix energy (the sum of squares of singular values) reduction caused by projection is compensated by energy transfer. We uniformly transfer the energy of the pruned singular values to the remaining ones. We theoretically show that energy transfer eases the trend of gradient vanishing caused by projection. In modern networks, a batch normalization (BN) layer can be merged into the previous convolution layer for inference, thereby influencing the optimal low-rank approximation (LRA) of the previous layer. We propose BN rectification to cut off its effect on the optimal LRA, which further improves the performance. Comprehensive experiments on CIFAR-10 and ImageNet have justified that our method is superior to other low-rank compression methods and also outperforms recent state-of-the-art pruning methods. For object detection and semantic segmentation, our method still achieves good compression results. In addition, we combine LRPET with quantization and hashing methods and achieve even better compression than the original single method. We further apply it in Transformer-based models to demonstrate its transferability. Our code is available at https://github.com/BZQLin/LRPET.

The first Survey and Review article in this issue is “Decay Properties of Spectral Projectors with Applications to Electronic Structure,” by Michele Benzi, Paolo Boito, and Nader Razouk. The linear scaling methods that motivate this work use a reformulation of the conventional electronic structure calculation. Instead of iterating through linear eigenvalue problems, a sequence of spectral projectors, known as density matrices, is computed, from which physical quantities of interest follow directly. Large-scale computations then become feasible, provided that entries in the density matrices are localized; that is, they decay rapidly as we move away from the diagonal. The authors first introduce the basic principles of electronic structure theory and then survey current computational approaches, most of which have arisen in the physical sciences, and their underlying localization requirements. The overview highlights a distinction, often blurred outside the mathematics literature, between physical insight/intuition and mathematical rigor. The key contribution is then to formulate the localization problem mathematically, with transparent assumptions, and develop a unified analytical approach. The tools of matrix analysis and approximation theory are used to justify, where feasible, the localization “results” that are relied upon by today's algorithms. The theory is framed in an asymptotic regime where the system volume increases but the density of particles remains fixed. This type of thermodynamic limit will be familiar to readers who work on multiscale models in physics and chemistry, but may seem strange to numerical analysts who are used to “convergence” in the sense of fixed problem size and arbitrarily fine mesh. The upshot is a sequence of matrices of increasing dimension that look very different from those arising in the discretization of PDEs. After pushing the theory as far as possible, the authors comment on the practical implications of their bounds and raise a number of open questions. They also point out that these decay results may find useful application in other, unrelated, areas: quantum information theory, complex networks, and eigenvalue solvers for tridiagonal matrices. This article illustrates how applied analysis can justify some of the leaps of faith made in the physical sciences and also resolve controversies (section 8.7). It will be of particular interest to readers who work in matrix computation and approximation theory. The second article, “Conditional Gradient Algorithms for Rank-One Matrix Approximations with a Sparsity Constraint,” by Ronny Luss and Marc Teboulle, shares with the first article the theme of studying algorithms based on large-scale matrix computations for which our current analytical understanding fails to explain the performance seen in practice. Given a symmetric matrix, $A$, most applied mathematicians know that dimension reduction can be achieved by using the dominant eigenvectors (singular vectors) of $A$ to form optimal low-rank approximations. In the statistics literature, modulo a centering operation, the same idea is called principal component analysis (PCA). This type of least-squares approach tries to recover all rows and columns of $A$; equivalently, if we think of $A$ as representing pairwise interactions between nodes in a network, it aims to summarize the full set of interactions. It is, of course, possible to impose a postprocessing threshold to cut off the less important contributions, but it is also reasonable to shoot directly for the important rows/columns or the important network nodes. This alternative leads us to the class of problems discussed here---matrix optimization problems where there is a prescribed upper bound on the number of nonzeros in the required “eigenvectors.” The authors discuss a variety of computational approaches that have been proposed in the literature and introduce a unifying framework, which they call ConGradU, to characterize and analyze them. The basic iterative scheme, summarized in Algorithm 1, has an optimization substep with closed-form solutions in a number of important cases. The authors also include an informative computational example based on the textual content of State of the Union addresses from 1790--2011. For example, Table 6.2 shows the key word stems associated with the three principal factors for both thresholded PCA and the more direct sparsity-seeking alternative. It is clear that the sparse version is summarizing different information. This article will appeal to readers who are keen to keep up with developments in modern optimization and to learn about techniques that fall under the current “Big Data” banner.

Optimal Low-rank Approximation Research Articles

Related Topics

Articles published on Optimal Low-rank Approximation

Low-rank quaternion tensor completion for color video inpainting via a novel factorization strategy

Decomposition and graphical correspondence analysis of checkerboard copulas

Compact Model Training by Low-Rank Projection With Energy Transfer.

On Best Low Rank Approximation of Positive Definite Tensors

Non-Local Robust Quaternion Matrix Completion for Large-Scale Color Image and Video Inpainting.

Low Rank Pure Quaternion Approximation for Pure Quaternion Matrices

Curing basis set overcompleteness with pivoted Cholesky decompositions.

Lanczos method for large-scale quaternion singular value decomposition

A Simple Yet Efficient Evolution Strategy for Large-Scale Black-Box Optimization

Optimal Data-Driven Estimation of Generalized Markov State Models for Non-Equilibrium Dynamics

SAR Target Recognition via Local Sparse Representation of Multi-Manifold Regularized Low-Rank Approximation

Modal Analysis of Fluid Flows: An Overview

Optimal Low-rank Approximations of Bayesian Linear Inverse Problems

Image reconstruction from double random projection.

Optimal Kullback–Leibler approximation of Markov chains via nuclear norm regularisation

Sparse Planar Array Synthesis Using Matrix Enhancement and Matrix Pencil

Survey and Review

On the ADI method for the Sylvester equation and the optimal-[formula omitted] points

Inductive Robust Principal Component Analysis

Canonical Polyadic Decomposition with a Columnwise Orthonormal Factor Matrix

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Optimal Low-rank Approximation Research Articles

Related Topics

Articles published on Optimal Low-rank Approximation

Low-rank quaternion tensor completion for color video inpainting via a novel factorization strategy

Decomposition and graphical correspondence analysis of checkerboard copulas

Compact Model Training by Low-Rank Projection With Energy Transfer.

On Best Low Rank Approximation of Positive Definite Tensors

Non-Local Robust Quaternion Matrix Completion for Large-Scale Color Image and Video Inpainting.

Low Rank Pure Quaternion Approximation for Pure Quaternion Matrices

Curing basis set overcompleteness with pivoted Cholesky decompositions.

Lanczos method for large-scale quaternion singular value decomposition

A Simple Yet Efficient Evolution Strategy for Large-Scale Black-Box Optimization

Optimal Data-Driven Estimation of Generalized Markov State Models for Non-Equilibrium Dynamics

SAR Target Recognition via Local Sparse Representation of Multi-Manifold Regularized Low-Rank Approximation

Modal Analysis of Fluid Flows: An Overview

Optimal Low-rank Approximations of Bayesian Linear Inverse Problems

Image reconstruction from double random projection.

Optimal Kullback–Leibler approximation of Markov chains via nuclear norm regularisation

Sparse Planar Array Synthesis Using Matrix Enhancement and Matrix Pencil

Survey and Review

On the ADI method for the Sylvester equation and the optimal-[formula omitted] points

Inductive Robust Principal Component Analysis

Canonical Polyadic Decomposition with a Columnwise Orthonormal Factor Matrix