Evaluation of FPGA Partitioning Schemes for Time and Space Sharing of Heterogeneous Tasks

Umar Ibrahim Minhas,Georgios Karakonstantis,Roger Woods

doi:10.1007/978-3-030-17227-5_24

Abstract

Whilst FPGAs have been integrated in cloud ecosystems, strict constraints for mapping hardware to spatially diverse distribution of heterogeneous resources at run-time, makes their utilization for shared multi tasking challenging. This work aims at analyzing the effects of such constraints on the achievable compute density, i.e the efficiency in utilization of available compute resources. A hypothesis is proposed and uses static off-line partitioning and mapping of heterogeneous tasks to improve space sharing on FPGA. The hypothetical approach allows the FPGA resource to be treated as a service from higher level and supports multi-task processing, without the need for low level infrastructure support. To evaluate the effects of existing constraints on our hypothesis, we implement a relatively comprehensive suite of ten real high performance computing tasks and produce multiple bitstreams per task for fair evaluation of the various schemes. We then evaluate and compare our proposed partitioning scheme to previous work in terms of achieved system throughput. The simulated results for large queues of mixed intensity (compute and memory) tasks show that the proposed approach can provide higher than \(3{\times }\) system speedup. The execution on the Nallatech 385 FPGA card for selected cases suggest that our approach can provide on average \(2.9{\times }\) and \(2.3{\times }\) higher system throughput for compute and mixed intensity tasks while \(0.2{\times }\) lower for memory intensive tasks.

Highlights

We evaluate an alternative approach to partially reconfigurable regions (PRRs) by hypothesizing that a higher compute density can be achieved via static partitioning and mapping (SPM) of heterogeneous bitstreams
In addition to OpenCL, we use general high level synthesis parameters, to scale the task over multiple parallel compute units (CUs); multiple pipelines can be defined via a Single Instruction Multiple Data (SIMD) parameter, whilst the key compute intensive loops can be unrolled via the UNROLL (U) parameter
The maximum throughput is defined by the largest bitstream, limited by Field Programmable Gate Arrays (FPGAs) resources

Summary

Introduction

Cloud computing offers users ubiquitous access to a shared pool of resources, through centralized data centres. With increasing device sizes and efficiency for high performance computing, there has been an increased interest in recent times to integrate Field Programmable Gate Arrays (FPGAs) in data centres [5][11]. Their architecture and programming environment presents a different resource sharing model when compared to software programmable accelerators. Heterogeneous tasks in our context are defined by heterogeneity in resource utilization (compute, memory, logic) and execution time. The FPGA is partitioned into rectangular PRRs which are configured typically with a new bitstream via DPR, independently of the processing going on in other PRRs [17] This provides independence in time to each PRR, such that a task A running in a PRR can be instantly replaced by task B, when task A finishes. The FPGA is divided into multiple clock regions across both the vertical and horizontal axes, where the crossing of the region boundary requires custom logic implementation

Objectives

Methods

Results

Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Evaluation of FPGA Partitioning Schemes for Time and Space Sharing of Heterogeneous Tasks

Abstract

Highlights

Summary

Talk to us

Similar Papers

Lead the way for us

Publication Date: Jan 1, 2019
Citations: 5	License type: cc-by

Similar Papers

Evaluation of Static Mapping for Dynamic Space-Shared Multi-task Processing on FPGAs
... Georgios Karakonstantis
Journal of Signal Processing Systems | VOL. 93
, et. al. ... Georgios Karakonstantis
13 Feb 2021
Journal of Signal Processing Systems | VOL. 93

Deadline Aware Task Submission and Dynamic Virtual Machine Creation Technique in Cloud Computing
Jyoti Singh ... Jingchao Chen
-
Jyoti Singh, et. al.Jyoti Singh ... Jingchao Chen
25 Feb 2021
25 Feb 2021

Partitioning of polynomial tasks: test generation, an example
J Savir ... P.H Bardell
IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems | VOL. 10
J Savir, et. al.J Savir ... P.H Bardell
01 Jan 1991
IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems | VOL. 10

Evaluation of static and dynamic task mapping algorithms in NoC-based MPSoCs
Ewerson Carvalho ... Ney Calazans
-
Ewerson Carvalho, et. al.Ewerson Carvalho ... Ney Calazans
01 Oct 2009
01 Oct 2009

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Evaluation of FPGA Partitioning Schemes for Time and Space Sharing of Heterogeneous Tasks

Abstract

Highlights

Summary

Talk to us

Similar Papers