Abstract

In this work, we present a computationally efficient Reinforcement Learning mapping search heuristic for finding high quality mappings for N-dimensional convolution loops that uses a computationally inexpensive reward function based on potential data reuse of operands to guide the search process. We also present a RL state representation generalizable to N-dimensional convolution loops, and a state representation parsing strategy ensuring that only valid mappings are evaluated for quality. Our RL search heuristic is applicable to multi-core systems with a memory hierarchy. We show that our RL based search heuristic for a range of 3D convolution layers, at significantly lower computational expense than random search, generally yields mappings with lower Energy-Delay Product (EDP) for an architecture with multiple processing elements with shared memory connected to DRAM. Our evaluation results demonstrated across 19 3D convolution layers, shows that our RL method performed only an average 11.24% of the operations of that of Timeloop’s random search for assessing same number of valid mappings. The mappings found using Timeloop had an average 12.51% higher EDP compared to lowest EDP mapping found using our RL method. Further, the lowest EDP mappings found using our method had an average only 4.69× higher EDP than the theoretical lower bound EDP, with the best case being only 1.29× higher.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call