Abstract
Abstract: In the ever-evolving landscape of machine learning and deep neural networks, the demand for training large models on massive datasets has driven research into innovative approaches for efficient distributed optimization. These approaches aim to simultaneously improve model accuracy and generalization while mitigating the communication overhead challenges inherent to distributed training. Reduced communication complexity is highly desirable, as communication overhead often poses a performance bottleneck in distributed systems. This literature review explores the progression of communication-efficient techniques, culminating in the introduction of the Slow Momentum (SLOWMO) framework. It traces the trajectory of distributed optimization, addresses decentralization strategies, and highlights the role of Local Stochastic Gradient Descent (Local SGD) and momentum in communication-efficient algorithms. It delves into Block-Wise Model Update Filtering (BMUF) and introduces SLOWMO, a framework consistently enhancing optimization and generalization performance across various base algorithms. This review unveils the evolving field of communication-efficient distributed optimization, offering theoretical guarantees and practical insights for improving large-scale model training.
Published Version
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have