Efficient Job Offloading in Heterogeneous Systems Through Hardware-Assisted Packet-Based Dispatching and User-Level Runtime Infrastructure

Othon Tomoutzoglou,Marcello Coppola,Dimitris Mbakoyiannis,George Kornaros

doi:10.1109/tcad.2019.2907912

Abstract

Emerging heterogeneous systems architectures increasingly integrate general-purpose processors, GPUs, and other specialized computational units to provide both power and performance benefits. While the motivations for developing systems with accelerators are clear, it is important to design efficient dispatching mechanisms in terms of performance and energy while leveraging programmability and orchestration of the diverse computational components. In this paper, we present an infrastructure composed of a hardware, general, packet-based processing-dispatching unit, named generic packet processing unit (GPPU), and of an associated runtime that facilitates user-level access to GPPU objects, such as packets, queues, and contexts. Hence, we remove drawbacks of traditional costly user-to-kernel-level operations, low-level accelerator subtleties that hinder programming productivity, along with architectural obstacles such as handling accelerators’ unified virtual address space. We present the design and evaluation of our framework by integrating the GPPU infrastructure with data streaming type accelerators, image filtering, and matrix multiplication, tightly coupled to ARMv8 architecture via unified virtual memory. Under scaling workload our proposed dispatching methods can deliver $3.7{\times }$ performance improvement over baseline offloading, and up to $4.7{\times }$ better energy efficiency.

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Efficient Job Offloading in Heterogeneous Systems Through Hardware-Assisted Packet-Based Dispatching and User-Level Runtime Infrastructure

Abstract

Talk to us

Similar Papers

More From: IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems

Lead the way for us

Journal: IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems	Publication Date: Apr 11, 2019
Citations: 29

Similar Papers

Deep learning based data prefetching in CPU-GPU unified virtual memory
Xinjian Long ... Huiyang Zhou
Journal of Parallel and Distributed Computing | VOL. 174
Xinjian Long, et. al.Xinjian Long ... Huiyang Zhou
12 Dec 2022
Journal of Parallel and Distributed Computing | VOL. 174

CRUM: Checkpoint-Restart Support for CUDA's Unified Memory
Rohan Garg ... Gene Cooperman
-
Rohan Garg, et. al.Rohan Garg ... Gene Cooperman
01 Sep 2018
01 Sep 2018

Multithreaded virtual-memory-enabled reconfigurable hardware accelerators
Miljan Vuletic ... Paolo Ienne
-
Miljan Vuletic, et. al.Miljan Vuletic ... Paolo Ienne
01 Dec 2006
01 Dec 2006

Traversing large graphs on GPUs with unified memory
Prasun Gera ... Piyush Sao
Proceedings of the VLDB Endowment | VOL. 13
Prasun Gera, et. al.Prasun Gera ... Piyush Sao
01 Mar 2020
Proceedings of the VLDB Endowment | VOL. 13

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Efficient Job Offloading in Heterogeneous Systems Through Hardware-Assisted Packet-Based Dispatching and User-Level Runtime Infrastructure

Abstract

Talk to us

Similar Papers

More From: IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems