Finding Low-Rank Solutions via Nonconvex Matrix Factorization, Efficiently and Provably

Dohyung Park,Anastasios Kyrillidis,Sujay Sanghavi,Constantine Caramanis

doi:10.1137/17m1150189

Abstract

A rank-$r$ matrix $X \in \mathbb{R}^{m \times n}$ can be written as a product $U V^\top$, where $U \in \mathbb{R}^{m \times r}$ and $V \in \mathbb{R}^{n \times r}$. One could exploit this observation in optimization: e.g., consider the minimization of a convex function $f(X)$ over rank-$r$ matrices, where the set of low-rank matrices is modeled via $UV^\top$. Though such parameterization reduces the number of variables and is more computationally efficient (of particular interest is the case $r \ll \min\{m, n\}$), it comes at a cost: $f(U V^\top)$ becomes a nonconvex function w.r.t. $U$ and $V$. We study such parameterization on generic convex objectives $f$ and focus on first-order, gradient descent algorithms. We propose the bifactored gradient descent (\textttBFGD) algorithm, an efficient first-order method that operates directly on the $U, V$ factors. We show that when $f$ is (restricted) smooth, \textttBFGD has local sublinear convergence; when $f$ is both (restricted) smooth and (restricted) strongly convex, it has local linear convergence. For several applications, we provide simple and efficient initialization schemes that provide initial conditions, good enough for the above convergence results to hold, globally. Extensive experimental results support our arguments that \textttBFGD is an efficient and accurate nonconvex method, compared to state-of-the-art approaches.

Full Text