Machine Learning Parallelism Could Be Adaptive, Composable and Automated

Hao Zhang

doi:10.1184/r1/14402450.v1

Abstract

In recent years, the pace of innovations in the fields of machine learning (ML) has accelerated, researchers in SysML have created algorithms and systems that parallelizeML training over multiple devices or computational nodes. As ML models become more structurally complex, many systems have struggled to provide allround performance on a variety of models. Particularly, ML scale-up is usually underestimated in terms of the amount of knowledge and time required to map from an appropriate distribution strategy to the model. Applying parallel training systems tocomplex models adds nontrivial development overheads in addition to model prototyping, and often results in lower-than-expected performance. This thesis identifies and addresses research challenges in both usability and performance in parallel ML techniques and system implementations. The first part of this thesis presents a simple design principle, adaptive parallelism, that applies suitable parallelization techniques to model building blocks (e.g. layers) according to their specific ML properties. Following it, we derive a series of optimizations and implementations optimizing different aspects of ML parallelization. We examine them and show that they significantly boost the efficiency or scalability of ML training on clusters 2-10x in their applicable scenarios. Generalizing this methodology, this second part of this thesis formulates the ML parallelization as an end-to-end optimization problem, and seeks to solve it automatically, for two broad paradigms of ML parallelization tasks: single-node dynamicbatching and distributed ML parallelisms. We present principled representations to express the two classes of ML parallelisms, along with composable system architectures,Cavs and AutoDist, respectively. They enable rapid compositions of parallelization strategies for unseen models, improve parallelization performance, and simplify parallel ML programming. On top of them, the third part of this thesis presents an automatic parallelization framework, AutoSync, to automatically optimize synchronization strategies indata-parallel distributed training. AutoSync achieves high performance “out-of-thebox” – it navigates the space spanned by the proposed representation, and automaticallyidentifies synchronization strategies that report 1.2 - 1.6x speedups over existing hand-optimized systems, lowering the technical barrier of distributed ML and helping make it accessible to a larger community of users. Collectively, the set of techniques and systems developed in this thesis lead to the proof of the concept and the prototype implementation of an end-to-end compiler system for large-scale ML training on distributed environments.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Machine Learning Parallelism Could Be Adaptive, Composable and Automated

Abstract

Talk to us

Similar Papers

Lead the way for us

Similar Papers

Simple and Automatic Distributed Machine Learning on Ray
Hao Zhang ... Lianmin Zheng
-
Hao Zhang, et. al.Hao Zhang ... Lianmin Zheng
14 Aug 2021
14 Aug 2021

A test development of a data driven model to simulate chlorophyll data at Tongyeong bay in Korea
Sung Dae Kim ... Sang Hwa Choi
-
Sung Dae Kim, et. al.Sung Dae Kim ... Sang Hwa Choi
23 Mar 2020
23 Mar 2020

Phishing URLs Detection Using Sequential and Parallel ML Techniques: Comparative Analysis
Naya Nagy ... Afrah Shaahid
Sensors | VOL. 23
Naya Nagy, et. al.Naya Nagy ... Afrah Shaahid
26 Mar 2023
Sensors | VOL. 23

Enabling scalable and adaptive machine learning training via serverless computing on public cloud
Ahsan Ali ... Feng Yan
Performance Evaluation | VOL. -
Ahsan Ali, et. al.Ahsan Ali ... Feng Yan
01 Nov 2024
Performance Evaluation | VOL. -

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Machine Learning Parallelism Could Be Adaptive, Composable and Automated

Abstract

Talk to us

Similar Papers