Abstract

Parameter Server (PS)-based architecture is widely applied in distributed machine learning (DML), but it is still an open issue how to improve the DML performance in this frame-work. Existing works mainly focus on the view of workers. In this paper, we tackle this problem from another perspective, by leveraging the central control on the PS. Specifically, we propose SmartPS, which transforms the passive role of PS in traditional DML and fully exploits the intelligence of PS. Firstly, the PS holds the global view of parameter dependency, facilitating it to update workers' parameters selectively and proactively. Secondly, the PS records the workers' speeds, and prioritizes parameter transmission to narrow the gap between stragglers and fast workers. Thirdly, the PS considers the parameter dependency in consecutive training iterations, and opportunistically blocks unnecessary pushes from workers. We conduct comparative experiments with two typical benchmarks, Matrix Factorization (MF) and PageRank (PR). The experimental results prove that, compared with all the baseline algorithms (i.e. standard BSP, ASP and SSP), SmartPS can reduce the overall training time by 65.7%~84.9%, with the same training accuracy.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call