This paper presents a novel reinforcement learning (RL)-based framework for optimizing dataflow scheduling in Deep Neural Network (DNN) accelerators. As DNNs grow increasingly complex, efficient hardware accelerators such as TPUs and custom-designed ASICs are essential to meet high performance and energy efficiency demands. However, optimizing dataflow scheduling remains challenging due to the vast design space and dynamic hardware constraints. The proposed framework uses Proximal Policy Optimization (PPO) to dynamically adjust scheduling strategies. After the RL agent selects the rows to optimize, a brute-force search is employed to find the optimal solutions for these rows, ensuring that the scheduling satisfies both DNN parameters and hardware resource constraints. We validated the framework on various DNN models, including YOLO v3, Inception v4, MobileNet v3, and ResNet-50, across multiple accelerators like Eyeriss, TPU v3, and Simba. The experimental results show substantial improvements, with RL-Scheduling achieving up to a 65.6% reduction in execution cycles and a 59.7% improvement in energy efficiency over existing scheduling algorithms. Additionally, the method demonstrates superior algorithm execution efficiency compared to most existing approaches.
Read full abstract