Performance optimization is an important goal for High-level Synthesis (HLS). Existing HLS scheduling algorithms are all based on Control and Data Flow Graph (CDFG) and will schedule basic blocks in sequential order. Our study shows that the sequential scheduling order of basic blocks is a big limiting factor for achievable circuit performance. In this article, we propose a Dependency Graph (DG) with two important properties for scheduling. First, DG is a directed acyclic graph. Thus, no loop breaking heuristic is needed for scheduling. Second, DG can be used to identify the exact instruction parallelism. Our experiment shows that DG can lead to 76% instruction parallelism increase over CDFG. Based on DG, we propose a bottom-up scheduling algorithm to achieve much higher instruction parallelism than existing algorithms. Hierarchical state transition graph with guard conditions is proposed for efficient implementation of such high parallelism scheduling. Our experimental results show that our DG-based HLS algorithm can outperform the CDFG-based LegUp and the state-of-the-art industrial tool Vivado HLS by 2.88× and 1.29× on circuit latency, respectively.
Read full abstract