Abstract

Natural gradient method provides a powerful paradigm for training statistical models and offers several appealing theoretic benefits. It constructs the Fisher information matrix to correct the ordinary gradients, and thus, the cost may become prohibitively expensive in the large-scale setting. This paper proposes a quasi-natural gradient method to address this computational issue. It employs a rank-one model to capture the second-order information from the underlying statistical manifold and to iteratively update the Fisher approximations. A three-loop procedure is designed to implement the updating formulas. This procedure resembles the classical two-loop procedure in the limited-memory BFGS method but saves half of the memory usage while it can be made faster. The resulting method retains the convergence rate advantages of existing stochastic optimization methods and has inherent ability in handling nonconvexity. Numerical studies conducted on several machine learning tasks demonstrate the reduction in convergence time and the robustness in tackling nonconvexity relative to stochastic gradient descent and the online limited-memory BFGS method.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call