Accelerating Distributed Machine Learning by Smart Parameter Server

Jinkun Geng,Dan Li,Shuai Wang

doi:10.1145/3343180.3343192

Jinkun Geng, Dan Li + Show 1 more

https://doi.org/10.1145/3343180.3343192

Copy DOI

Export

Save

Cite

Publication Date: Aug 17, 2019

Citations: 11

Affiliation: Tsinghua University

Abstract
Full-Text
Similar Papers

Abstract

Listen

Parameter Server (PS)-based architecture is widely applied in distributed machine learning (DML), but it is still an open issue how to improve the DML performance in this frame-work. Existing works mainly focus on the view of workers. In this paper, we tackle this problem from another perspective, by leveraging the central control on the PS. Specifically, we propose SmartPS, which transforms the passive role of PS in traditional DML and fully exploits the intelligence of PS. Firstly, the PS holds the global view of parameter dependency, facilitating it to update workers' parameters selectively and proactively. Secondly, the PS records the workers' speeds, and prioritizes parameter transmission to narrow the gap between stragglers and fast workers. Thirdly, the PS considers the parameter dependency in consecutive training iterations, and opportunistically blocks unnecessary pushes from workers. We conduct comparative experiments with two typical benchmarks, Matrix Factorization (MF) and PageRank (PR). The experimental results prove that, compared with all the baseline algorithms (i.e. standard BSP, ASP and SSP), SmartPS can reduce the overall training time by 65.7%~84.9%, with the same training accuracy.

Full Text