Distributed gradient-free and projection-free algorithm for stochastic constrained optimization

Jie Hou,Xianlin Zeng,Chen Chen

doi:10.1007/s43684-024-00062-0

Abstract

Distributed stochastic zeroth-order optimization (DSZO), in which the objective function is allocated over multiple agents and the derivative of cost functions is unavailable, arises frequently in large-scale machine learning and reinforcement learning. This paper introduces a distributed stochastic algorithm for DSZO in a projection-free and gradient-free manner via the Frank-Wolfe framework and the stochastic zeroth-order oracle (SZO). Such a scheme is particularly useful in large-scale constrained optimization problems where calculating gradients or projection operators is impractical, costly, or when the objective function is not differentiable everywhere. Specifically, the proposed algorithm, enhanced by recursive momentum and gradient tracking techniques, guarantees convergence with just a single batch per iteration. This significant improvement over existing algorithms substantially lowers the computational complexity. Under mild conditions, we prove that the complexity bounds on SZO of the proposed algorithm are O(n/ϵ2)\\documentclass[12pt]{minimal} \\usepackage{amsmath} \\usepackage{wasysym} \\usepackage{amsfonts} \\usepackage{amssymb} \\usepackage{amsbsy} \\usepackage{mathrsfs} \\usepackage{upgreek} \\setlength{\\oddsidemargin}{-69pt} \\begin{document}$\\mathcal{O}(n/\\epsilon ^{2})$\\end{document} and O(n(21ϵ))\\documentclass[12pt]{minimal} \\usepackage{amsmath} \\usepackage{wasysym} \\usepackage{amsfonts} \\usepackage{amssymb} \\usepackage{amsbsy} \\usepackage{mathrsfs} \\usepackage{upgreek} \\setlength{\\oddsidemargin}{-69pt} \\begin{document}$\\mathcal{O}(n(2^{\\frac{1}{\\epsilon}}))$\\end{document} for convex and nonconvex cases, respectively. The efficacy of the algorithm is verified on black-box binary classification problems against several competing alternatives.

Full Text