Abstract
FPGAs facilitate prototyping and debug, and recently accelerate full-stack simulations due to their rapid turnaround time (TAT). However, this TAT is restrictive in exhaustive design space explorations of parameterized RTL generators, especially DNN accelerators that unleash an explosive full-stack search space. This paper presents Quickloop, an efficient and scalable framework to enable FPGA-accelerated exploration. Quickloop first abstracts away the cumbersome flow of RTL generation, software stack, FPGA toolflow, workload execution and metrics extraction by wrapping these stages into isolated Quicksteps, featuring cascadability, scalability, and replay. Then, we analytically minimize the FPGA toolflow TAT via a novel, data-driven strategy that intelligently utilizes build fragments from previous iterations, enhancing the loop efficiency and simultaneously lowering the toolflow’s compute utilization.Quickloop is built around the OpenAI Gym environment framework and thus supports drop-in regression and reinforcement learning explorations. With a Quickloop around a reference Berkeley’s Gemmini DNN accelerator, we exhaustively explore its parameter space and discover complex performance patterns, based on full-stack simulation of Imagenet benchmarks as a workload. Compared to conventional FPGA toolflow, we further show that Quickloop effectively reduces episodal time by above 30%, as the episode approaches realistic lengths.
Published Version
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.