Fast and Inexpensive High-Level Synthesis Design Space Exploration: Machine Learning to the Rescue

Md Imtiaz Rashid,Benjamin Carrion Schafer

doi:10.1109/tcad.2023.3258341

Abstract

High-Level Synthesis (HLS) has multiple significant advantages over traditional RT-level design flows. One in particular that we address in this work is the ability to generate multiple functional equivalent design variants with unique trade-offs like area, performance and power from the same behavioral description. This is typically done by setting synthesis options in the form or pragmas (comments) to mainly control how to synthesize arrays (RAM or registers), loops (unroll, partially unroll, no unroll or pipeline) and functions (inline or not). Setting different pragma combinations lead to these different design implementations. Out of all the pragma combinations the designer is typically only interested in those that lead to the Pareto-optimal designs. Fortunately, this search can be automated, but unfortunately, the search space to find these pragma combinations grows supra-linearly with the number of pragma settings. Thus, fast and efficient heuristics are needed. These heuristics generate a new pragma combination and then evaluate their effect by synthesizing (HLS) it. The most time-consuming part of this process is having to execute a full synthesis (HLS) on the behavioral description for every new pragma combination. One obvious way to accelerate the exploration is to parallelize the exploration process using a multithreaded heuristic. The theoretical speedup should match the number of parallel threads. The main problem with this approach is that every HLS invokation requires to check out a HLS tool license. This license is not released until the synthesis process has finished. This implies that the maximum number of parallel threads is restricted by the number of available licenses, which in the ASIC case are extremely expensive. On the contrary, FPGA vendors make their HLS tools free. Thus, it is tempting to investigate if FPGA HLS tools can be used to find the Pareto-optimal designs in the ASIC case. To address this, in this work we present a dedicated multithreaded parallel HLS design space explorer (DSE) based on transfer learning that is able to accelerate HLS DSE for ASICs by targeting first FPGAs and using machine learning to convert the exploration results obtained to find the optimal ASIC equivalent. Experimental results show the effectiveness and robustness of our approach.

Full Text