Communication-efficient distributed cubic Newton with compressed lazy Hessian

Zhen Zhang,Keqin Che,Shaofu Yang,Wenying Xu

doi:10.1016/j.neunet.2024.106212

Abstract

Recently, second-order distributed optimization algorithms have been becoming a research hot in distributed learning, due to their faster convergence rate than the first-order algorithms. However, second-order algorithms always suffer from serious communication bottleneck. To conquer such challenge, we propose communication-efficient second-order distributed optimization algorithms in the parameter-server framework, by incorporating cubic Newton methods with compressed lazy Hessian. Specifically, our algorithms require each worker communicate compressed Hessians with the server only at some particular iterations, which can save both communication bits and communication rounds. For non-convex problems, we theoretically prove that our algorithms can reduce the communication cost comparing to the state-of-the-art second-order algorithms, while maintaining the same iteration complexity order O(ϵ−3/2) as the centralized cubic Newton methods. By further using gradient regularization technique, our algorithms can achieve global convergence for convex problems. Moreover, for strongly convex problems, our algorithms can achieve local superlinear convergence rate without any requirement on initial conditions. Finally, numerical experiments are conducted to show the high efficiency of the proposed algorithms.

Full Text