Abstract

Deep Reinforcement Learning has been successfully applied in various applications and achieved impressive performance compared with previous traditional methods but suffers from high computation cost and long training time. MLPerf takes deep reinforcement learning as one of the benchmark tracks and provides a single node training version of MiniGo as a reference. A key challenge is to achieve efficient MiniGo training on a large-scale computing system. According to the training computation pattern in MiniGo and the characteristics of our large-scale heterogeneous computing system, we propose a MultiLevel Parallel strategy, MLPs, including task-level parallelism between nodes, CPU-DSP heterogeneous parallelism, and DSP multi-core parallelism. The proposed method reduces the overall execution time from 43 hours to 16 hours while scaling the node size from 1067 to 4139. The scaling efficiency is 69.1%. According to our fitting method, the scaling efficiency is 46.5% when scaling to 8235 nodes. The experimental results show that the proposed method achieves the efficient training of MiniGo on the largescale heterogeneous computing system.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call