Custom ASIC Research Articles

The exponential performance growth guaranteed by Moore’s law has started to taper in recent years. At the same time, emerging applications like image processing demand heavy computational performance. These factors inevitably lead to the emergence of domain-specific accelerators (DSA) to fill the performance void left by conventional architectures. FPGAs are rapidly evolving towards becoming an alternative to custom ASICs for designing DSAs because of their low power consumption and a higher degree of parallelism. DSA design on FPGAs requires careful calibration of the FPGA compute and memory resources towards achieving optimal throughput. Hardware Descriptive Languages (HDL) like Verilog have been traditionally used to design FPGA hardware. HDLs are not geared towards any domain, and the user has to put in much effort to describe the hardware at the register transfer level. Domain Specific Languages (DSLs) and compilers have been recently used to weave together handwritten HDLs templates targeting a specific domain. Recent efforts have designed DSAs with image-processing DSLs targeting FPGAs. Image computations in the DSL are lowered to pre-existing templates or lower-level languages like HLS-C. This approach requires expensive FPGA re-flashing for every new workload. In contrast to this fixed-function hardware approach, overlays are gaining traction. Overlays are DSAs resembling a processor, which is synthesized and flashed on the FPGA once but is flexible enough to process a broad class of computations through soft reconfiguration. Less work has been reported in the context of image processing overlays. Image processing algorithms vary in size and shape, ranging from simple blurring operations to complex pyramid systems. The primary challenge in designing an image-processing overlay is maintaining flexibility in mapping different algorithms. This paper proposes a DSL-based overlay accelerator called FlowPix for image processing applications. The DSL programs are expressed as pipelines, with each stage representing a computational step in the overall algorithm. We implement 15 image-processing benchmarks using FlowPix on a Virtex-7-690t FPGA. The benchmarks range from simple blur operations to complex pipelines like Lucas-Kande optical flow. We compare FlowPix against existing DSL-to-FPGA frameworks like Hetero-Halide and Vitis Vision library that generate fixed-function hardware. On most benchmarks, we see up to 25% degradation in latency with approximately a 1.7x to 2x increase in the FPGA LUT consumption. Our ability to execute any benchmark without incurring the high costs of hardware synthesis, place-and-route, and FPGA re-flashing justifies the slight performance loss and increased resource consumption that we experience. FlowPix achieves an average frame rate of 170 FPS on HD frames of 1920 × 1080 pixels in the implemented benchmarks.

OpenFlow is unable to provide customized flow tables, resulting in memory explosions and high switch retirement rates. This is the bottleneck for the development of SDN. Recently, P4 (Programming Protocol-independent Packet Processors) attracts much attentions from both academia and industry. It provides customized networking services by offering flow-level control. P4 can “produce” various forwarding tables according to packets. P4 increases the speed of custom ASICs. However, with the prevalence of P4, the multiple forwarding tables could explode when used in large scale networks. The explosion problem can slow down the lookup speed, which causes congestions and packet losses. In addition, the pipelined structure of forwarding tables brings additional processing delay. In this study, we will improve the lookup performance by optimizing the forwarding tables of P4. Intuitively, we will install the rules according to their popularity, i.e., the popular rules will appear earlier than others. Thus, the packets can hit the matched rule sooner. In this paper, we formalize the optimization problem, and prove that the problem is NP-hard. To solve the problem, we propose a heuristic algorithm called EPSP (Efficient Pipeline Processing Scheme for P4), which can largely reduce the lookup time while keeping the forwarding actions the same. Because running the optimization algorithm frequently brings additional processing burdens, wedesign an incremental update algorithm to alleviate this problem. To evaluate the proposed algorithms, we set up the simulation environments based on ns-3. The simulation results show that the algorithm greatly reduces both the lookup time and the number of memory accesses. The incremental algorithm largely reduces the processing burdens while the lookup time remains almost the same with the non-incremental algorithm. We also implemented a prototype using floodlight and mininet. The results show that our algorithm brings acceptable burder, and performs better than traditional algorithm.

Custom ASIC Research Articles

Related Topics

Articles published on Custom ASIC

Overview of the ATLAS High-Granularity Timing Detector: project status and results

CMS Outer Tracker Phase-2 Upgrade on-module powering

Development of the ATLAS Liquid Argon Calorimeter Readout Electronics for the HL-LHC

FlowPix: Accelerating Image Processing Pipelines on an FPGA Overlay using a Domain Specific Compiler

High-Density Pixel Imaging Sensor Readout Electronics for Space Applications: A Design Overview

The COLUTA ADC ASIC and the ATLAS HL-LHC Liquid Argon Front-End Readout

BETSEE: testing for system-wide effects of single event effects on ITk strip modules

ARASP: An ASIP Processor for Automated Reversible Logic Synthesis

CICERO: A Domain-Specific Architecture for Efficient Regular Expression Matching

Design, construction, and test of the Gas Pixel Detectors for the IXPE mission

A Runtime Reconfigurable Design of Compute-in-Memory–Based Hardware Accelerator for Deep Learning Inference

Magnetic triggering — time-resolved characterisation of silicon strip modules in the presence of switching DC-DC converters

Impact of NCFET Technology on Eliminating the Cooling Cost and Boosting the Efficiency of Google TPU

Design and performance of a silicon–tungsten calorimeter prototype module and the associated readout

An efficient pipeline processing scheme for programming Protocol-independent Packet Processors

Comprehensive Custom ASIC Offering from Marvell

Asic Design and Verification of Amba Apb Protocol using Uvm

Mass production of a trigger data serializer ASIC for the upgrade of the muon spectrometer at the ATLAS experiment

Synergy

The ATLAS Fast TracKer—Architecture, Status and High-Level Data Quality Monitoring Framework

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Custom ASIC Research Articles

Related Topics

Articles published on Custom ASIC

Overview of the ATLAS High-Granularity Timing Detector: project status and results

CMS Outer Tracker Phase-2 Upgrade on-module powering

Development of the ATLAS Liquid Argon Calorimeter Readout Electronics for the HL-LHC

FlowPix: Accelerating Image Processing Pipelines on an FPGA Overlay using a Domain Specific Compiler

High-Density Pixel Imaging Sensor Readout Electronics for Space Applications: A Design Overview

The COLUTA ADC ASIC and the ATLAS HL-LHC Liquid Argon Front-End Readout

BETSEE: testing for system-wide effects of single event effects on ITk strip modules

ARASP: An ASIP Processor for Automated Reversible Logic Synthesis

CICERO: A Domain-Specific Architecture for Efficient Regular Expression Matching

Design, construction, and test of the Gas Pixel Detectors for the IXPE mission

A Runtime Reconfigurable Design of Compute-in-Memory–Based Hardware Accelerator for Deep Learning Inference

Magnetic triggering — time-resolved characterisation of silicon strip modules in the presence of switching DC-DC converters

Impact of NCFET Technology on Eliminating the Cooling Cost and Boosting the Efficiency of Google TPU

Design and performance of a silicon–tungsten calorimeter prototype module and the associated readout

An efficient pipeline processing scheme for programming Protocol-independent Packet Processors

Comprehensive Custom ASIC Offering from Marvell

Asic Design and Verification of Amba Apb Protocol using Uvm

Mass production of a trigger data serializer ASIC for the upgrade of the muon spectrometer at the ATLAS experiment

Synergy

The ATLAS Fast TracKer—Architecture, Status and High-Level Data Quality Monitoring Framework