Abstract

Stochastic Gradient Descent (SGD) is a popular optimization method used to train a variety of machine learning models. Most of SGD work to-date has concentrated on improving its statistical efficiency, in terms of rate of convergence to the optimal solution. At the same time, as parallelism of modern CPUs continues to increase through progressively higher core counts, it is imperative to understand the parallel hardware efficiency of SGD, which often comes at odds with its statistical efficiency. In this paper, we explore several modern parallelization methods of SGD on a shared memory system, in the context of sparse and convex optimization problems. Specifically, we develop optimized parallel implementations of several SGD algorithms, and show that their parallel efficiency is severely limited by inter-core communication. We propose a new, scalable, communication-avoiding, many-core friendly implementation of SGD, called HogBatch, which exposes parallelism on several levels, minimizes the impact on statistical efficiency, and, as a result significantly outperforms the other methods. On a variety of datasets, HogBatch demonstrates near linear scalability on a system with 14 cores, as well as delivers up to a 20X speedup over previous methods.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.