Fast Approximation of the Gauss--Newton Hessian Matrix for the Multilayer Perceptron

Chao Chen,Hans-Joachim Bungartz,George Biros,Severin Reiz,Chenhan D Yu

doi:10.1137/19m129961x

Abstract

We introduce a fast algorithm for entrywise evaluation of the Gauss--Newton Hessian (GNH) matrix for the fully connected feed-forward neural network. The algorithm has a precomputation step and a sampling step. While it generally requires $\mathcal{O}(Nn)$ work to compute an entry (and the entire column) in the GNH matrix for a neural network with $N$ parameters and $n$ data points, our fast sampling algorithm reduces the cost to $\mathcal{O}(n+d/\epsilon^2)$ work, where $d$ is the output dimension of the network and $\epsilon$ is a prescribed accuracy (independent of $N$). One application of our algorithm is constructing the hierarchical-matrix ($\mathcal{H}$-matrix) approximation of the GNH matrix for solving linear systems and eigenvalue problems. It generally requires $\mathcal{O}(N^2)$ memory and $\mathcal{O}(N^3)$ work to store and factorize the GNH matrix, respectively. The $\mathcal{H}$-matrix approximation requires only $\mathcal{O}(N r_o)$ memory footprint and $\mathcal{O}(N r_o^2)$ work to be factorized, where $r_o \ll N$ is the maximum rank of off-diagonal blocks in the GNH matrix. We demonstrate the performance of our fast algorithm and the $\mathcal{H}$-matrix approximation on classification and autoencoder neural networks. (A corrected version is attached.)

Full Text