High-level Synthesis Design Research Articles

High-Level Synthesis (HLS) has multiple significant advantages over traditional RT-level design flows. One in particular that we address in this work is the ability to generate multiple functional equivalent design variants with unique trade-offs like area, performance and power from the same behavioral description. This is typically done by setting synthesis options in the form or pragmas (comments) to mainly control how to synthesize arrays (RAM or registers), loops (unroll, partially unroll, no unroll or pipeline) and functions (inline or not). Setting different pragma combinations lead to these different design implementations. Out of all the pragma combinations the designer is typically only interested in those that lead to the Pareto-optimal designs. Fortunately, this search can be automated, but unfortunately, the search space to find these pragma combinations grows supra-linearly with the number of pragma settings. Thus, fast and efficient heuristics are needed. These heuristics generate a new pragma combination and then evaluate their effect by synthesizing (HLS) it. The most time-consuming part of this process is having to execute a full synthesis (HLS) on the behavioral description for every new pragma combination. One obvious way to accelerate the exploration is to parallelize the exploration process using a multithreaded heuristic. The theoretical speedup should match the number of parallel threads. The main problem with this approach is that every HLS invokation requires to check out a HLS tool license. This license is not released until the synthesis process has finished. This implies that the maximum number of parallel threads is restricted by the number of available licenses, which in the ASIC case are extremely expensive. On the contrary, FPGA vendors make their HLS tools free. Thus, it is tempting to investigate if FPGA HLS tools can be used to find the Pareto-optimal designs in the ASIC case. To address this, in this work we present a dedicated multithreaded parallel HLS design space explorer (DSE) based on transfer learning that is able to accelerate HLS DSE for ASICs by targeting first FPGAs and using machine learning to convert the exploration results obtained to find the optimal ASIC equivalent. Experimental results show the effectiveness and robustness of our approach.

Read full abstract

Visual Odometry (VO) systems are widely used to determine the position and orientation of a robot or camera in an unknown environment. They are deployed on resource-constrained platforms, such as drones and Virtual Reality (VR) or Augmented Reality (AR) headsets. VO systems harnessing modern System-on-Chip (SoCs) with integrated Field Programmable Gate Array (FPGA) have the potential to improve the overall systems performance. This paper explores the FPGA acceleration of sparse VO kernels using High-level Synthesis (HLS) as this kind of VO system has been designed to use with low-power SoCs. We show that both computational and data transfer overheads between the processing cores of the CPU of the SoC and the accelerators on the FPGA need to be optimized to obtain better end-to-end performance. This is a result of the additional data movement incurred when using an FPGA accelerator and also because of the sparse computational nature with predictable or random memory access patterns of the kernels involved. However, state-of-the-art HLS tools are not yet able to perform the required optimizations automatically because they usually assume that the kernels to be accelerated have dense computational patterns with regular memory access. In this paper we propose three, potentially generic, methods to reduce the data transfer between the CPU and the customised hardware kernels on the FPGA; these methods are: (a) approximation based on domain-specific knowledge, (b) image compression, and (c) the use of on-the-fly computation. We present a case study of the use of these methods on SVO, a state-of-the-art sparse VO system with a semi-direct front-end. We demonstrate that our proposed methods can reduce data transfer overhead to achieve better end-to-end performance and that they can be applied not only when using standard Xilinx HLS tools but also with other state-of-the-art HLS tools, such as HeteroFlow. Compared to the baseline performance of the original SVO software on an Arm CPU, our proposed methods assist the HLS and HeteroFlow designs to achieve a speedup of 2.4x and 2.14x, respectively, without noticeable accuracy loss. The HLS and HeteroFlow designs also achieve a 1.85x and 1.89x, respectively, improvement in energy efficiency on the SoC system used. Compared to the SVO software baseline running on the Intel Xeon CPU, our proposed methods assist the HLS and HeteroFlow designs to achieve 8.2x and 8.3x improvement in energy efficiency, respectively.

Read full abstract

High-level Synthesis Design Research Articles

Related Topics

Articles published on High-level Synthesis Design

CollectiveHLS: A Collaborative Approach to High-Level Synthesis Design Optimization

Decomposition based estimation of distribution algorithm for high-level synthesis design space exploration

PASTA: Programming and Automation Support for Scalable Task-Parallel HLS Programs on Modern Multi-Die FPGAs

FADO: Floorplan-Aware Directive Optimization Based on Synthesis and Analytical Models for High-Level Synthesis Designs on Multi-Die FPGAs

Fast and Inexpensive High-Level Synthesis Design Space Exploration: Machine Learning to the Rescue

HLS‐based swarm intelligence driven optimized hardware IP core for linear regression‐based machine learning

FPGA implementation of QUasi-Affine TRansformation evolutionary algorithm

AutoScaleDSE: A Scalable Design Space Exploration Engine for High-Level Synthesis

High-Level Synthesis Design of Scalable Ultrafast Ultrasound Beamformer With Single FPGA.

A Parameterized Parallel Design Approach to Efficient Mapping of CNNs onto FPGA

Exploring Sparse Visual Odometry Acceleration With High-Level Synthesis

Graph Neural Networks for High-Level Synthesis Design Space Exploration

Machine learning based fast and accurate High Level Synthesis design space exploration: From graph to synthesis

PathDriver+: Enhanced Path-Driven Architecture Design for Flow-Based Microfluidic Biochips

FastSim: A Fast Simulation Framework for High-Level Synthesis

Learning from the Past: Efficient High-level Synthesis Design Space Exploration for FPGAs

Unified FPGA Design for the HEVC Dequantization and Inverse Transform Modules

DB4HLS: A Database of High-Level Synthesis Design Space Explorations

AutoBridge: Coupling Coarse-Grained Floorplanning and Pipelining for High-Frequency HLS Design on Multi-Die FPGAs.

Cluster-Based Heuristic for High Level Synthesis Design Space Exploration

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

High-level Synthesis Design Research Articles

Related Topics

Articles published on High-level Synthesis Design

CollectiveHLS: A Collaborative Approach to High-Level Synthesis Design Optimization

Decomposition based estimation of distribution algorithm for high-level synthesis design space exploration

PASTA: Programming and Automation Support for Scalable Task-Parallel HLS Programs on Modern Multi-Die FPGAs

FADO: Floorplan-Aware Directive Optimization Based on Synthesis and Analytical Models for High-Level Synthesis Designs on Multi-Die FPGAs

Fast and Inexpensive High-Level Synthesis Design Space Exploration: Machine Learning to the Rescue

HLS‐based swarm intelligence driven optimized hardware IP core for linear regression‐based machine learning

FPGA implementation of QUasi-Affine TRansformation evolutionary algorithm

AutoScaleDSE: A Scalable Design Space Exploration Engine for High-Level Synthesis

High-Level Synthesis Design of Scalable Ultrafast Ultrasound Beamformer With Single FPGA.

A Parameterized Parallel Design Approach to Efficient Mapping of CNNs onto FPGA

Exploring Sparse Visual Odometry Acceleration With High-Level Synthesis

Graph Neural Networks for High-Level Synthesis Design Space Exploration

Machine learning based fast and accurate High Level Synthesis design space exploration: From graph to synthesis

PathDriver+: Enhanced Path-Driven Architecture Design for Flow-Based Microfluidic Biochips

FastSim: A Fast Simulation Framework for High-Level Synthesis

Learning from the Past: Efficient High-level Synthesis Design Space Exploration for FPGAs

Unified FPGA Design for the HEVC Dequantization and Inverse Transform Modules

DB4HLS: A Database of High-Level Synthesis Design Space Explorations

AutoBridge: Coupling Coarse-Grained Floorplanning and Pipelining for High-Frequency HLS Design on Multi-Die FPGAs.

Cluster-Based Heuristic for High Level Synthesis Design Space Exploration