Abstract

In recent years, distributed stochastic algorithms have become increasingly useful in the field of machine learning. However, similar to traditional stochastic algorithms, they face a challenge where achieving high fitness on the training set does not necessarily result in good performance on the test set. To address this issue, we propose to use of a distributed network topology to improve the generalization ability of the algorithms. We specifically focus on the Sharpness-Aware Minimization (SAM) algorithm, which relies on perturbation weights to find the maximum point with better generalization ability. In this paper, we present the decentralized stochastic sharpness-aware minimization (D-SSAM) algorithm, which incorporates the distributed network topology. We also provide sublinear convergence results for non-convex targets, which is comparable to consequence of Decentralized Stochastic Gradient Descent (DSGD). Finally, we empirically demonstrate the effectiveness of these results in deep networks and discuss their relationship to the generalization behavior of SAM.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call