A Framework for Mapping DRL Algorithms With Prioritized Replay Buffer Onto Heterogeneous Platforms

Chi Zhang,Viktor Prasanna,Yuan Meng

doi:10.1109/tpds.2023.3264823

Abstract

Despite the recent success of Deep Reinforcement Learning (DRL) in self-driving cars, robotics and surveillance, training DRL agents takes tremendous amount of time and computation resources. In this paper, we aim to accelerate DRL with Prioritized Replay Buffer due to its state-of-the-art performance on various benchmarks. The computation primitives of DRL with Prioritized Replay Buffer include environment emulation, neural network inference, sampling from Prioritized Replay Buffer, updating Prioritized Replay Buffer and neural network training. The speed of running these primitives varies for various DRL algorithms such as Deep Q Network and Deep Deterministic Policy Gradient. This makes a fixed mapping of DRL algorithms inefficient. In this work, we propose a framework for mapping DRL algorithms onto heterogeneous platforms consisting of a multi-core CPU, a GPU and a FPGA. First, we develop specific accelerators for each primitive on CPU, FPGA and GPU. Second, we relax the data dependency between priority update and sampling performed in the Prioritized Replay Buffer. By doing so, the latency caused by data transfer between GPU, FPGA and CPU can be completely hidden without sacrificing the rewards achieved by agents learned using the target DRL algorithms. Finally, given a DRL algorithm specification, our design space exploration automatically chooses the optimal mapping of various primitives based on an analytical performance model. On widely used benchmark environments, our experimental results demonstrate up to 997.3× improvement in training throughput compared with baseline mappings on the same heterogeneous platform. Compared with the state-of-the-art distributed Reinforcement Learning framework RLlib, we achieve 1.06 <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"><tex-math notation="LaTeX">$\times \sim$</tex-math></inline-formula> 1005× improvement in training throughput.

Full Text