Embedded Field Programmable Gate Array Research Articles

Both modern datacenter and embedded Field Programmable Gate Arrays (FPGAs) provide great opportunities for high-performance and high-energy-efficiency computing. With the growing public availability of FPGAs from major cloud service providers such as AWS, Alibaba, and Nimbix, as well as uniform hardware accelerator development tools (such as Xilinx Vitis and Intel oneAPI) for software programmers, hardware and software developers can now easily access FPGA platforms. However, it is nontrivial to develop efficient FPGA accelerators, especially for software programmers who use high-level synthesis (HLS).The major goal of this article is to figure out how to efficiently access the memory system of modern datacenter and embedded FPGAs in HLS-based accelerator designs. This is especially important for memory-bound applications; for example, a naive accelerator design only utilizes less than 5% of the available off-chip memory bandwidth. To achieve our goal, we first identify a comprehensive set of factors that affect the memory bandwidth, including (1) the clock frequency of the accelerator design, (2) the number of concurrent memory access ports, (3) the data width of each port, (4) the maximum burst access length for each port, and (5) the size of consecutive data accesses. Then, we carefully design a set of HLS-based microbenchmarks to quantitatively evaluate the performance of the memory systems of datacenter FPGAs (Xilinx Alveo U200 and U280) and embedded FPGA (Xilinx ZCU104) when changing those affecting factors, and we provide insights into efficient memory access in HLS-based accelerator designs. Comparing between the typically used soft and hardened memory systems, respectively, found on datacenter and embedded FPGAs, we further summarize their unique features and discuss the effective approaches to leverage these systems. To demonstrate the usefulness of our insights, we also conduct two case studies to accelerate the widely used K-nearest neighbors (KNN) and sparse matrix-vector multiplication (SpMV) algorithms on datacenter FPGAs with a soft (and thus more flexible) memory system. Compared to the baseline designs, optimized designs leveraging our insights achieve about\( 3.5\times \)and\( 8.5\times \)speedups for the KNN and SpMV accelerators. Our final optimized KNN and SpMV designs on a Xilinx Alveo U200 FPGA fully utilize its off-chip memory bandwidth, and achieve about\( 5.6\times \)and\( 3.4\times \)speedups over the 24-core CPU implementations.

Read full abstract

The Joint European Torus (JET) is currently undertaking an enhancement program, in which one of the objectives is to test relevant diagnostics for the International Thermonuclear Experimental Reactor (ITER), the reference for the next generation of fusion experiments. One of the challenges in ITER is the provision of real-time data analysis and compression capabilities, to sustain the expected long duration discharges and the high acquisition rates achieved by recent data acquisition systems. Foreseeing this real-time requirement, a new system was developed and installed at JET for the gamma-ray and hard X-ray profile monitor diagnostic. The new system, which is connected to 19 CsI(Tl) photodiodes in order to obtain the line-integrated profiles of the gamma-ray and hard X-ray emissions, was designed to overcome the data acquisition limitations of the present fast electron Bremsstrahlung diagnostic (FEB), while exploiting the required real-time features. This paper presents the real-time processing architecture for the JET gamma-ray and hard X-ray profile monitor. The system hardware, based on the Advanced Telecommunication Computer Architecture (ATCA) standard, includes reconfigurable digitizer modules with embedded Field Programmable Gate Array (FPGA) devices capable of acquiring and simultaneously processing data in real-time from the 19 detectors. A suitable algorithm was developed and implemented in the FPGAs, which are able to deliver the corresponding energy of the acquired pulses, and its associated occurrence time. The real-time processed data is sent periodically, during the discharge, through the JET real-time Asynchronous Transfer Mode (ATM) network, and stored in the JET scientific databases at the end of the pulse. Publishing the processed data in the ATM network enables it to be used for machine control purposes (e.g. the information about the line-integrated emissions of the hard X-rays in real time can be used to determine the lower hybrid current drive deposition before the main heating phase). Additionally, the real-time processed data is used for local calibration, using embedded radioactive sources to build in real-time the 19 channels spectra. The acquired raw data is also stored in the digitizer modules' local memory and retrieved after the pulse to the JET database, where it can be post-processed offline to validate the real-time algorithms. The interface between the ATCA digitizers, the JET Control and Data Acquisition System (CODAS) and the JET real-time network is provided by the Multithreaded Application Real-Time executor (MARTe). From the experimental results it was concluded that it is possible to measure in real-time the line-integrals of both hard X-ray and gamma-ray emissions, covering energy range from ∼200 keV to 8 MeV. This allows us to meet two of the major milestones: the ability to process and supply high volume data rates in real-time over a wide spectrum energy range.

Read full abstract

Embedded Field Programmable Gate Array Research Articles

Related Topics

Articles published on Embedded Field Programmable Gate Array

Embedded FPGA developments in 130 nm and 28 nm CMOS for machine learning in particle detector readout

Dual-resonant scanning multiphoton microscope with ultrasound lens and resonant mirror for rapid volumetric imaging

Development of apparatus for mean-lifetime measurement of cosmic-ray muons using plastic scintillation detectors and FLASH-ADC/FPGA-based readout electronics

Demystifying the Soft and Hardened Memory Systems of Modern FPGAs for Software Programmers through Microbenchmarking

Hardware-Accelerated Real-Time Spectrum Analyzer With a Broadband Fast Sweep Feature Based on the Cost-Effective SDR Platform

Exploration of Word Width and Cluster Size Effects on Tree-Based Embedded FPGA Using an Automation Framework

FARD

A review on embedded field programmable gate array architectures and configuration tools

A Design Methodology of Digital Control System for MEMS Gyroscope Based on Multi-Objective Parameter Optimization.

A Parallel Image Registration Algorithm Based on a Lattice Boltzmann Model

Design and Implementation of a New Wireless Carotid Neckband Doppler System with Wearable Ultrasound Sensors: Preliminary Results

Soft-Core Embedded-FPGA Based on Multistage Switching Networks: A Quantitative Analysis

A Control System and Streaming DAQ Platform with Image-Based Trigger for X-ray Imaging

Real-Time Processing System for the JET Hard X-Ray and Gamma-Ray Profile Monitor Enhancement

Precision Control of Modular Robot Manipulators: The VDC Approach With Embedded FPGA

Real-time algorithms for JET hard X-ray and gamma-ray profile monitor

Cosmic ray angular distribution employing plastic scintillation detectors and flash-ADC/FPGA-based readout systems

A heterodyne interferometer with periodic nonlinearities smaller than ±10 pm

Phase measurement of various commercial heterodyne He–Ne-laser interferometers with stability in the picometer regime

Configurable Embedded CPG-Based Control for Robot Locomotion

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Embedded Field Programmable Gate Array Research Articles

Related Topics

Articles published on Embedded Field Programmable Gate Array

Embedded FPGA developments in 130 nm and 28 nm CMOS for machine learning in particle detector readout

Dual-resonant scanning multiphoton microscope with ultrasound lens and resonant mirror for rapid volumetric imaging

Development of apparatus for mean-lifetime measurement of cosmic-ray muons using plastic scintillation detectors and FLASH-ADC/FPGA-based readout electronics

Demystifying the Soft and Hardened Memory Systems of Modern FPGAs for Software Programmers through Microbenchmarking

Hardware-Accelerated Real-Time Spectrum Analyzer With a Broadband Fast Sweep Feature Based on the Cost-Effective SDR Platform

Exploration of Word Width and Cluster Size Effects on Tree-Based Embedded FPGA Using an Automation Framework

FARD

A review on embedded field programmable gate array architectures and configuration tools

A Design Methodology of Digital Control System for MEMS Gyroscope Based on Multi-Objective Parameter Optimization.

A Parallel Image Registration Algorithm Based on a Lattice Boltzmann Model

Design and Implementation of a New Wireless Carotid Neckband Doppler System with Wearable Ultrasound Sensors: Preliminary Results

Soft-Core Embedded-FPGA Based on Multistage Switching Networks: A Quantitative Analysis

A Control System and Streaming DAQ Platform with Image-Based Trigger for X-ray Imaging

Real-Time Processing System for the JET Hard X-Ray and Gamma-Ray Profile Monitor Enhancement

Precision Control of Modular Robot Manipulators: The VDC Approach With Embedded FPGA

Real-time algorithms for JET hard X-ray and gamma-ray profile monitor

Cosmic ray angular distribution employing plastic scintillation detectors and flash-ADC/FPGA-based readout systems

A heterodyne interferometer with periodic nonlinearities smaller than ±10 pm

Phase measurement of various commercial heterodyne He–Ne-laser interferometers with stability in the picometer regime

Configurable Embedded CPG-Based Control for Robot Locomotion