Randomized block-coordinate adaptive algorithms for nonconvex optimization problems

Yangfan Zhou,Kaizhu Huang,Jiang Li,Cheng Cheng,Xuguang Wang,Amir Hussian,Xin Liu

doi:10.1016/j.engappai.2023.105968

Abstract

Nonconvex optimization problems have always been one focus in deep learning, in which many fast adaptive algorithms based on momentum are applied. However, the full gradient computation of high-dimensional feature vector in the above tasks become prohibitive. To reduce the computation cost for optimizers on nonconvex optimization problems typically seen in deep learning, this work proposes a randomized block-coordinate adaptive optimization algorithm, named RAda, which randomly picks a block from the full coordinates of the parameter vector and then sparsely computes its gradient. We prove that RAda converges to a δ-accurate solution with the stochastic first-order complexity of O(1/δ2), where δ is the upper bound of the gradient’s square, under nonconvex cases. Experiments on public datasets including CIFAR-10, CIFAR-100, and Penn TreeBank, verify that RAda outperforms the other compared algorithms in terms of the computational cost.

Full Text