FPGA Card Research Articles

The subject of study in this article is the evaluation of the performance issues of cloud services implemented using FPGA technology. The goal is to improve the performance of cloud services built on top of multiple FPGA platforms known as FPGA-as-a-Service (FaaS). Task: to analyze the delays in communications between host computer and FPGA; propose the steps of development to reduce the delay and perform the evaluation of the response time for the FPGA-based accelerator depending on number of involved cards; consider the reliability aspect of such systems implemented using programmable logic. According to the tasks, the following results were obtained. The FPGA-as-a-Service where FPGA resources are provided through a set of hardware/software toolset is considered. The usage of queueing theory for cloud-based services is analyzed. The contribution of the parts of FPGA-as-a-Service to the final delay of the service is discussed. The process of modeling of work the services based on FPGA accelerator cards with use of Jackson's network is analyzed in detail. The model of the delays of FaaS that considers the parameters of accelerator FPGA cards is offered. The formula of the total response time of the service combined based on the response of the components of is obtained. The proposed steps of reduce data processing delays include increase the size of data blocks for processing in FPGA by each kernel, change the communication model with kernel from sequential to pipelined, following timing closure technique and use more FPGA accelerator cards in parallel to divide the enquiring delay. Based on the proposed model the evaluation of response time of FaaS was done. The advantage of the use of many FPGAs in parallel for same data processing task instead of implementation of requests thread for each accelerator card is shown. Conclusions. The main contribution of this study is a step forward to the modeling of FPGA-based services that can be used for FPGA-based artificial intelligence (AI) applications. It helps to improve the performance of the system by means of reducing the delays at different stages of requests processing. Another side of this result is the reliability aspect that is based on modified manner of service operation in case of use the proposed steps of system optimization. It helps to improve the processing of requests to FaaS. The proposed method is the next step after prototyping of such systems because it helps to turn the FaaS from the object for development to the tool for deployment of new technologies like AI applications.

Modern FPGA accelerators can be equipped with many high-bandwidth network I/Os, e.g., 64 x 50 Gbps, enabled by onboard optics or co-packaged optics. Some dozens of tightly coupled FPGA accelerators form an emerging computing platform for distributed data processing. However, a conventional indirect packet network using Ethernet's Intellectual Properties imposes an unacceptably large amount of the logic for handling such high-bandwidth interconnects on an FPGA. Besides the indirect network, another approach builds a direct packet network. Existing direct inter-FPGA networks have a low-radix network topology, e.g., 2-D torus. However, the low-radix network has the disadvantage of a large diameter and large average shortest path length that increases the latency of collectives. To mitigate both problems, we propose a lightweight, fully connected inter-FPGA network called OPTWEB for efficient collectives. Since all end-to-end separate communication paths are statically established using onboard optics, raw block data can be transferred with simple link-level synchronization. Once each source FPGA assigns a communication stream to a path by its internal switch logic between memory-mapped and stream interfaces for remote direct memory access (RDMA), a one-hop transfer is provided. Since each FPGA performs input/output of the remote memory access between all FPGAs simultaneously, multiple RDMAs efficiently form collectives. The OPTWEB network provides 0.71-μsec start-up latency of collectives among multiple Intel Stratix 10 MX FPGA cards with onboard optics. The OPTWEB network consumes 31.4 and 57.7 percent of adaptive logic modules for aggregate 400-Gbps and 800-Gbps interconnects on a custom Stratix 10 MX 2100 FPGA, respectively. The OPTWEB network reduces by 40 percent the cost compared to a conventional packet network.

FPGA Card Research Articles

Related Topics

Articles published on FPGA Card

Noctua2 Supercomputer

FPGA realization of an image encryption system using the DCSK-CDMA technique

Fast neural network inference on FPGAs for triggering on long-lived particles at colliders

Accelerating the Verification of Forward Error Correction Decoders by PCIe FPGA Cards

A FPGA-Based Architecture for Real-Time Cluster Finding in the LHCb Silicon Pixel Detector

Effective switchless inter-FPGA memory networks

Development and Validation of an optimized syndromes block for reed solomon decoder

MAC-based Artificial Neural network for voice command recognition

Method of QoS evaluation of FPGA as a service

AIOC: An All-in-One-Card Hardware Design for Financial Market Trading System

Design, optimization and Real Time implementation of a new Embedded Chien Search Block for Reed-Solomon (RS) and Bose-Chaudhuri-Hocquenghem (BCH) codes on FPGA Board

SmartFVM: A Fast, Flexible, and Scalable Hardware-based Virtualization for Commodity Storage Devices

High-Level Synthesis Implementation of an Embedded Real-Time HEVC Intra Encoder on FPGA for Media Applications

OPTWEB: A Lightweight Fully Connected Inter-FPGA Network for Efficient Collectives

Design and Implementation of Intelligent Controller Based on hybrid renewable energy source

Design, optimization and Real Time implementation of a new Embedded Chien Search Block for Reed-Solomon (RS) and Bose-Chaudhuri-Hocquenghem (BCH) codes on FPGA Board.

Evaluation of Static Mapping for Dynamic Space-Shared Multi-task Processing on FPGAs

Design and Implementation of Content-Based Image Retrieval on FPGA card

Recent diagnostic developments at ASDEX Upgrade with the FPGA implemented Serial I/O System “SIO2” and “Pipe2” DAQ periphery

Manipulación de 7 servomotores con FPGA iCEstick Evaluation Kit

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

FPGA Card Research Articles

Related Topics

Articles published on FPGA Card

Noctua2 Supercomputer

FPGA realization of an image encryption system using the DCSK-CDMA technique

Fast neural network inference on FPGAs for triggering on long-lived particles at colliders

Accelerating the Verification of Forward Error Correction Decoders by PCIe FPGA Cards

A FPGA-Based Architecture for Real-Time Cluster Finding in the LHCb Silicon Pixel Detector

Effective switchless inter-FPGA memory networks

Development and Validation of an optimized syndromes block for reed solomon decoder

MAC-based Artificial Neural network for voice command recognition

Method of QoS evaluation of FPGA as a service

AIOC: An All-in-One-Card Hardware Design for Financial Market Trading System

Design, optimization and Real Time implementation of a new Embedded Chien Search Block for Reed-Solomon (RS) and Bose-Chaudhuri-Hocquenghem (BCH) codes on FPGA Board

SmartFVM: A Fast, Flexible, and Scalable Hardware-based Virtualization for Commodity Storage Devices

High-Level Synthesis Implementation of an Embedded Real-Time HEVC Intra Encoder on FPGA for Media Applications

OPTWEB: A Lightweight Fully Connected Inter-FPGA Network for Efficient Collectives

Design and Implementation of Intelligent Controller Based on hybrid renewable energy source

Design, optimization and Real Time implementation of a new Embedded Chien Search Block for Reed-Solomon (RS) and Bose-Chaudhuri-Hocquenghem (BCH) codes on FPGA Board.

Evaluation of Static Mapping for Dynamic Space-Shared Multi-task Processing on FPGAs

Design and Implementation of Content-Based Image Retrieval on FPGA card

Recent diagnostic developments at ASDEX Upgrade with the FPGA implemented Serial I/O System “SIO2” and “Pipe2” DAQ periphery

Manipulación de 7 servomotores con FPGA iCEstick Evaluation Kit